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COOPERATIVE GRADUATE SUMMER SESSIONS IN STATISTICS 


Beginning in 1954, North Carolina State College, the University of 
Florida, Virginia Polytechnic Institute, and the Southern Regional Education 
Board will sponsor cooperative Graduate Summer Sessions in Statistics. The 
first session will be held at Virginia Polytechnic Institute, Blacksburg, Va., 
June 9-July 17, 1954. Additional sessions are tentatively planned for North 
Carolina State College and the University of Florida. 

The sessions, based upon a recommendation of the Southern Regional 
Education Board’s Commission on Statistics, will be of interest to (1) research 
and professional workers desiring instruction in basic statistical concepts, 
(2) teachers desiring training in modern statistics, (3) prospective candidates 
for graduate degrees in statistics, (4) graduate students in other fields wanting 
work in statistics, and (5) statisticians wishing to keep informed about ad- 
vanced theory and methods. 

Each session will last six weeks; each course will carry three semester 
hours of graduate credit, with a maximum of six credits per summer. The 
scheduling of courses will permit consecutive work in successive summers. 
The work may be applied at any of the cooperating institutions toward the 
requirements for a Master’s degree. Doctoral candidates should consult with 
their institutions regarding the applicability of the courses. 

During the first session Professor Maurice Kendall of the University of 
London will teach Multivariate Analysis, and Dr. Ralph Comstock of North 
Carolina State College, Quantitative Genetics. The staff of the Virginia 
Polytechnic Institute’s Department of Statistics will offer such courses as 
Probability and Inference, Analysis of Variance, Statistical Methods, Engi- 
neering Statistics, Educational Statistics, Rank Order Statistics, and the 
Theory of Sequential Methods. The department includes R. A. Bradley, 
D. B. Duncan, M. C. K. Tweedie, P. M. Somerville, and Boyd Harshbarger. 
Other statisticians will direct afternoon seminars. Advanced courses in the 
agricultural, science, and engineering divisions of the College will be available. 

The fee is $30.00. Board, room, post office box, and laundry for the entire 
session may be had for $76.40. Inquiries should be addressed to Boyd Harsh- 
barger, Head, Department of Statistics, Virginia Polytechnic Institute, 
Blacksburg, Virginia. 
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WHO BELONGS IN THE FAMILY?* 
RosBert L. THORNDIKE 
TEACHERS COLLEGE, COLUMBIA UNIVERSITY 


I was sitting before my TV set, a while back, watching Captain Video 
and pondering the organizational problems of psychologists, psychometricians, 
psychodiagnosticians, psycho-somatists, psychosomnabulists, and psycho- 
ceramics (crack-pots to you). Wondering what I might do, in my small way, 
to help out, I decided to enlist Captain Video’s help to bring me from the 
Black Planet that super-galactian hypermetrician, Dr. Idnozs Heahscror- 
Tenib, cosmos-famous discoverer of Serutan. 

Why delay? The Galaxy was on its way, and in half a light year Dr. Tenib 
was at my side prepared to devote his gargantuan talents to the task. 

Seeing no point in confusing the good doctor by trying to describe to him 
the present administrative hodge-podge, I said, ‘“‘Doctor, let’s start from 
scratch. I want you to find out for me how these good people who are present 
at the annual meeting of the APA structure themselves? What families are 
represented? How many, or better, how few? And who belongs to each?” 

“We proceed,” said the Doctor. “Bring sample of population; I measure.” 

So we set out to design a sample. The problem presented some interesting 
theoretical aspects, but the final solution was relatively simple. We stationed 
representatives at each of the three state beverage stores and followed every 
third badge-wearing individual who came out of a store. We selected only out- 
going patrons for obvious reasons. 

After assisting each respondent to unburden himself, we brought him to 
Dr. Idnozs (as we came to call him among ourselves) for study. 

“Now,” murmured the Doctor, “we give tests. First is ‘Draw-a-Psy- 
chiatrist Test.’ ”’ 

“We score this,”’ he confided, ‘“‘by if it gives horns.” 

Presently we started on the physiological test battery. 

“We draw off saliva drop by drop,” explained our idiot savant, “and see 
does he drool when we bring in Skinner Box.” 

Later came the Peculiar Preference Blank. 

“Forced-choice, you know,” whispered the Doctor. “‘Would you rather 
make mud pies or kiss gorgeous blonde?” 


*Presidential address to the Psychometric Society, September 7, 1953. 
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“Doctor,” I said, “let’s not get personal.”’ 

Time will not permit a full description of the Doctor’s ingenious test 
battery. It will be fully elaborated in a forthcoming issue of the Journal of 
Ortho-Personometrics. Needless to say, the tests were all orthogonal, com- 
pletely diagnostic, of highest reliability, and representative of the fundamental 
dimensions of psycho-personality (the personality of psychologists and psy- 
chopaths.) 

I must also skip over with only passing mention the unique procedures 
by which the Doctor established fundamental equal-unit scales for the different 
dimensions included in his battery, and how he provided for equivalence of 
metric from one dimension to another. 

“Is simple,” said the good Doctor. ‘“‘Take a number from one to ten. Is 
a score. Single digit. Standardized. When I say one equals one, one equals 
one.”’ 

“What now, Doctor?” I asked. ‘‘Do we run a Q-type factor analysis to 
locate the dimensions and clusters in our sample?” 

“Ts no good,” replied my mentor. ‘“Neglects differences in score level. 
Washes out differences in variability. Indicates dimensions, but doesn’t 
locate boundary of clusters.” 

“Well, then, shall we calculate a multiple discriminant function?” 

“No good. Have no a priori groups. Multiple discriminant only perpetu- 
ates sins of the fathers. (Remind me I tell you sometime about by father.) 
Tells which Division to put man in. Not tell what Divisions should be.”’ 

“What then?” 

“We run cluster analysis. Find distances between sheep and goats. 
Assign to clusters so that average of distances within cluster is minimum, 
when summed over all clusters. Define families, boundaries, and family 
membership like so.” 

And so that is what we did. We had the set of scores for each person. As 
I mentioned before, thanks to Dr. T’s skill they had been designed so that 
they were orthogonal measures, so we didn’t have to worry about the effects 
of covariance. And we were also fortunate in that the problems of a metric 
had been worked out for us by the giant brain. It was, therefore, a simple step 
to express the ‘distance’ between any pair of persons as the square root of 
the sum of squares of the score differences on each one of the tests. The 
problem that remained was merely that of selecting from the N-square matrix 
of between-persons distances k sub-sets chosen in such a way that the average 
of the distances within sub-sets, summed over all k of them, was a minimum. 

“Have showed you how,” said the Doctor. ‘“Now I go. Is dinner time on 
Black Planet.’ 

“But, Doctor,” I expostulated, “‘how do I go about identifying the opti- 
mum k sub-sets?”’ 
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“Ts easy. Finite number of combinations. Only 563 billion billion billion. 


Try all. Keep best.” 
I acknowledged the cogency of his method, then rallied feebly for one last 


question. 
“But, Doctor, how shall I tell how many families there are? How many 


clusters there should be?” 
“Ts dinner time. Don’t bother me.” And the good Doctor vanished rapidly 


into the stardust of outer space. 





Dr. T had departed, but the problem we had faced together lingered with 
me. 

Suppose we have a set of specimens—of psychologists, of psychopaths, 
of jobs, or whatever. Suppose we have a set of measures of each person, job, 
or the like. Suppose for the moment that questions which may be raised about 
the representativeness of the measures, their independence, their metrics 
have all been satisfactorily answered. Suppose that we have computed a 
scalar distance between each of the specimens in the m-space represented by 
our m measures. Suppose we wish to subdivide our N specimens into k subsets 
in such a way that the subsets shall be as compact and homogeneous as 
possible. Suppose we define compactness by requiring that the average of all 
the distances between specimens within the same subset shall be a minimum. 
That is, we want the members of each family to be as much alike as possible 
with respect to the set of measures which we have elected to study. How, then, 
shall we decide upon the value of kK—the number of families or clusters? Is 
there any meaningful way of defining an appropriate, or natural, or “optimum” 
number of clusters? And once k has been determined, how shall we decide 
upon the boundaries and the centroids of the various clusters? How shall we 
tell where one should end and the next begin? Who belongs in which family? 

These appear to be genuine problems, with real meaning in a number of 
practical contexts. Some solution must be arrived at by the dress designer 
engaged in manufacturing clothes, who must decide on the number of different 
sizes for women’s clothes and the dimensions for each. Some solution must be 
reached by the military personnel specialist who must identify groups and 
families of jobs in the military services in planning testing batteries, classifica- 
tion systems, and career guidance programs. A solution is implied in the work 
of those sociologists who undertake to identify the class structure of a com- 
munity and delimit the class membership of individuals. 

Let us start with the second problem first, because it looks somewhat more 
docile and amenable to attack. The problem is: For a given value of k, how 
shall we assign N specimens to k categories so that the average of the within- 
categories distances will be a minimum? So that there will be as much likeness 
within families and as much difference between families as possible? 
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Dr. T. has already given us the simon-pure'mathematician’s answer.’The 
number of combinations is finite. Try them all and pick the best. But that 
solution is not very comforting. Though finite to the mathematician, the 
number of combinations is without limit for the man who must work with the 
data. With only 10 specimens and two clusters, the number of possible com- 
binations is over a thousand, and the number increases at a rapidly accelerat- 
ing rate with increase in either N or k. 

The mathematicians in my family also assure me that there is no analytic 
mathematical solution to this problem. We appear to be thrown back ‘on 
iterative approximation procedures. 

The exploratory work we have done suggests that such procedures can be 
developed in a form which is not too laborious, and which converges relatively 
promptly to a stable solution. From here on in, I would like to illustrate the 
process with a miniature set of data from analyses which we have been doing 
with a view to defining more rationally the family relationship of Air Force 
jobs. These particular data have a number of shortcomirgs, so no particular 
weight should be attached to the substantive results. 

The basic data consist of the ratings of each of 12 Air Force job categories 
with respect to 19 dimensions. The dimensions were selected on the basis of a 
rather extensive correlational analysis of 130 attributes which have been 
applied to jobs in job descriptions and elsewhere. The 19 dimensions were 
chosen as being to a large extent mutually independent, fairly reliably rated, 
significant for a number of Air Force jobs, and differentially significant for 
different jobs. 

The average rating of each job on a scale from 0 to 9 is shown in Table 1. 
Ratings of the jobs were made by four or more supervisory non-coms. The 
inter-job distances are presented in Table 2. We report here only one particular 
case—that of three families for the set of 12 jobs. 

Our procedure is to assume that the two jobs which are at the greatest 
distance from one another will axiomatically fall in different families. The 
third cluster starts with the job which is least near to either of the other two. 
Each cluster is built up by adding on that specimen which is nearest to the one 
which initially defined the cluster. A specimen is added to each cluster in turn, 
and the cycle is repeated until all specimens are assigned. We then have a set 
of initial clusters of equal size, and we can determine for each specimen its 
average distance from the members of its own cluster and of the other clusters. 
This situation is shown in Table 3. 

Generally speaking, a specimen is mis-assigned if it is closer to the mem- 
bers of another cluster than to the members of its own. Such a situation is 
illustrated by the job of General Instructor. Cases of this sort are re-assigned, 
one at a time, starting with the most obvious misfits, and the average distances 
are recomputed after each assignment. (This is actually a good deal less 
laborious than it sounds.) Shifts are made until there is no further shift which 








scl 
| 
N 


ROBERT L. THORNDIKE 


TABLE 1 
Average Ratings of 12 Air Force Specialties 


on Requirement of 19 Attributes 
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Attribute 





6.0 4.1 5.4 4.4 7.6 1.2 1.8 0.5 2.9 0.5 6.0 6.5 


5.5 6.8 5.6 3.5 6.5 1.6 0.5 1.4 1.1 0.2 5.8 3.0 


1 Strength .... 
2 Tools 


3 Fluency of 


5.0 4.0 5.1 5.7 5.0 5.1 5.3 5.4 8.5 5.5 3.8 7.8 
7.5 7.8 8.2 7.4 7.0 7.8 8.3 7.6 7.0 9.0 7.0 8.0 


Expression . . 


4 Accuracy 


5 Manipulative 


1.8 6.2 5.0 2.8 


8.5 5.4 4.5 4.4 5.0 68 4.8 6.4 


Ability 
6 Responsibility for 


7.0 8.2 


Work of Others 6.0 5.8 7.0 7.9 7.5 6.0 6.5 7.0 8.2 6.5 


7 Emotional Control 6.0 6.2 7.0 7.6 8.5 6.8 7.0 7.2 8.2 


8 Speed . 


1.2 7.0 S38 


5:8 6.0 7.5 6:3 7.5 6.2 6.4 6.4 6.1 7.0 6:5 620 


9 Foot-Hand 


1.8 3.6 3.0 2.8 7.5 0.4 0.2 0.1 2.3 0.2 3.5 5.4 


Coordination . 


10 Work under 


Dangerous 


7.8 


7.9 4.8 9.0 4.5 4.4 


4.0 3.9 3.7 4.2 7.0 2.5 09 1.2 2.5 2.0 6.2 


Conditions 
11 Clerical Perception 3.0 4.6 4.2 7.5 4.0 7.8 8.1 


12 Concentration amid 


6.5 
7 


7.4 


6.8 
8.5 6.4 6.6 6.8 5.5 6.4 6.0 6.9 


(2 6:0 6.5 ‘630 750 729 


Distraction . . 


13 Induction 


C3 5.2 80 
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14 Interpreting Maps, 


1.2 2.2 3.0 3.0 5.0 


1.1 2.8 4.2 2.3 2.2 


8.0 5.1 


Diagrams, etc. 
15 Spatial Judgment 4.2 5.8 2.5 3.1 6.0 0.8 1.5 0.0 1.2 0.2 3.8 3.9 


16 Flexibility 
17 Arithmetic 


7.8 6.0 6.5 


5.3 5.6 6.3 7.0 5.6 5.0 6.3 7.4 


6.5 


9.0 4.2 3.5 


4.2 65.2 3.6 4.9 5.8 4.4 5.4 4.8 4.2 


Computation . 
18 Social Adaptability 7.0 6.0 8.0 7.6 6.5 


19 Actuating Multiple 


7.5 8.5 


2 7.4 8.5 6.2 
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TABLE 2 
Inter-Job Distances of 12 Air Force Jobs* 
Air Force Jobs 1 2 3 4 5 6 7 8 9 mW 1 @ 
1 Radio Mechanic — 62 99 96 104 118 128 187 134 140 84 105 
2 Aircraft Mechanic 62 — 73 75 78 104 109 125 119 1380 55 95 
3 OS ae 99 73 — 51 95 84 90 95 83 118 50 84 
4 Supply Technician 96 75 51 — 99 64 67 77 67 89 60 75 
5 Petroleum Supply 
Technician... 104 78 95 99 — 149 153 166 142 167 64 79 
St .  e 118 104 84 64 1499 — 35 28 83 60 101 125 
7 Career Guidance 
Specialist 128 109 90 67 158 385 — 41 80 58 109 129 
8 Personnel Specialist 187 125 95 77 166 28 41 — 80 57 119 141 
9 General Instructor 1384 119 83 67 142 838 80 80 — 100 108 93 
10 Budget & Fiscal 
Clerk .... 140 1380 118 89 167 60 58 57 100 — 182 145 
11 Medical Corpsman 84 55 50 60 64 101 109 119 108 132 — “7% 
12 Air Policeman 105 95 84 75 79 125 129 141 938 145 7 — 
*Multiplied by 10 to remove decimal. 
TABLE 3 
Initial Grouping Into Three Clusters, Showing Cluster Membership and 
Average Distance of Each Job from Jobs in Each Cluster 
Clusters 
Job 
A B C 
Jobs 1, 2,4,9 Jobs 3,5, 11,12 Jobs 6, 7, 8, 10 
1 Radio Mechanic ...... 97* 98 131 
2 Aircraft Mechanic. ..... 85* 75 117 
OC OS a a ees 76 76* 97 
4 Supply Technician ..... 79* 71 74 
5 Petroleum Supply Technician . 106 79* 159 
OO) 2 ae eee 92 115 41* 
7 Career Guidance Specialist . . 96 120 45* 
8 Personnel Specialist . .... 105 130 42* 
9 General Instructor ..... 107* 106 86 
10 Budget & Fiscal Clerk... . 115 140 59* 
11 Medical Corpsman ..... 72 63* 115 
12 Aw Pokeeman ....... 97 80* 135 





*Asterisk indicates cluster to which each job is assigned. 
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will reduce the average of all the within-cluster distances. This is the situation 
which we find in Table 4. This appears to be a uniquely best assignment of the 
12 jobs to three families, in the sense that we have defined best. 


TABLE 4 
Final Grouping into Three Clusters 














Clusters 
Job 
A B C 
Jobs 1,2,5 Jobs 3, 4, 11,12 Jobs6, 7, 8, 9, 10 

1 Radio Mechanic ...... 83* 96 131 
2 Aircraft Mechanic. ..... 70* 74 117 
eC | ne re te 89 62* 94 
4 Supply Technician ..... 90 62* 73 
5 Petroleum Supply Technician . 91*t 84 155 
OIE Monae eos ie ney) hares 124 94 52* 
7 Career Guidance Specialist . . 130 99 54* 
8 Personnel Specialist . . ... 143 108 52* 
9 General Instructor ..... 132 88 86* 
10 Budget & Fiscal Clerk . . . . 146 121 69* 
11 Medical Corpsman ..... 68 62* 114 
12 Air Policeman ....... 93 78* 127 





*Asterisk indicates cluster to which each job is assigned. 

tJob 5 (Petroleum Supply Technician) is assigned to Cluster A rather than Cluster B because, due to the 
small size of the cluster, it has less effect on the over-all average distance in that cluster than it would in the 
larger Cluster B. 


The nature of a given family can best be defined by computing the cen- 
troid of the jobs which make up the family. These centroids are shown in 
Table 5. Thus, Family A is made up of jobs which call for relatively high 
amounts of familiarity with tools, manipulative ability, spatial judgment, and 
facility in manipulating multiple controls. Family B, by contrast, emphasizes 
social adaptability and ability to take responsibility for the work of others. 
Family C is the one that is highest on clerical perception, arithmetic computa- 
tion, and fluency of expression, and is very low on strength, coordination, and 
the like. Factors which do not serve to differentiate any of the families to an 
appreciable extent are accuracy, emotional control, speed, concentration amid 
distractions, induction, and flexibility. The dimensions which differentiate 
between the clusters provide initial hypotheses as to dimensions important for 
a personnel classification program, and the extent to which a given factor 
differentiates is a cue to its significance for such a program. 

The approximation procedure for arriving at the optimum definition of 
clusters for a specified value of k, the number of clusters, seems moderately 
satisfying. Now we must face the much nastier problem of determining the 
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appropriate value for k. Into how many families should the specimens be 
grouped? 

Obviously, every increase in the number of families results in some reduc- 
tion in the average distance within families, just as every addition of a variable 


TABLE 5 
Average Weights for Three Job Clusters on Each of 19 Attributes 








Clusters 





Attribute 


> 
Q 


B 





. Physical strength and endurance ...... 
. Knowledge of hand and power tools 

. Fluency of expression ........2... 
2 re Sane 
. Manipulative ability Song Pe ee ee 
. Responsibility for work of others... ... 
- Emotional control ......2..2.2.2.. 
Se 

. Foot-hand coordination Le ee 
10. Work under dangerous conditions 

11. Clerical perception 

12. Concentration amid distractions 

EB PEOIOR. 55s a oe es Bes 
14. Interpreting maps, ‘diagrams, ete. ae a ae ge 
1b) pau sedement... . . 8 tt 
16. Flexibility are 

17. Arithmetic computation ee 

18. Social adaptability 

19. Actuating multiple controls 
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to a multiple regression equation results in some further increase in the value 
of the multiple correlation. The more pieces into which we chop our m-space, 
the shorter the distances within each. How are we to decide when to stop? 
Here I must admit that I am stumped. 

As I have indicated, with every increase in k there will be a decrease in 
the average within-cluster distance (which we may call A). The manner in 
which the distance decreases for our illustrative example is shown in Figure 1. 
Ideally, one would like some type of significance test of the change in A as k 
increases from 2 to 3 to 4, and so on. But I am unable to produce such a test. 
Furthermore, I suspect that if one could be developed it would involve an 
assumption of normality of the distributions of the specimens in the various 
dimensions. This assumption is in fairly direct conflict with the notion of 
families, or clusters, or types. In the one case, we assume continuous unimodal 
distributions. In the other case, we are interested in foci, in more dense con- 
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centrations of specimens in certain limited regions. It is when such concentra- 
tions exist that a distinctively “best” set of families will be found. 

One might examine the drop in A with the increase in k, using a diagram 
such as Figure 1. Intuitively, it seems that a sudden marked flattening of the 
curve at any point should identify a distinctively ‘“right’’ value of k. That is, 


70 
60h 
so. 


40 


Average Within-Cluster Distance 


20 





| ! i it i i 





2 3 4 5 6 7 
Number of Clusters 
FIGURE 1 


Average Within-Cluster Distance for Different Numbers 
of Clusters (Based on distances for 12 Air Force jobs) 


this should be a point at which the number of families uniquely corresponds 
to the configuration of points, since there is relatively little gain from further 
increase in the number of clusters. I have tried to test this out empirically, 
using synthetic data. That is, I have built up sets of points which were dis- 
tributed around a known number of specific foci, with random variation away 
from these foci introduced, and then determined the clusters for successively 
larger values of k. The results for three examples are shown in Figure 2. The 
curves do not provide much support for the intuitive specification of the 
number of clusters. 
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Finally, one might specify the number of clusters simply by administra- 
tive fiat, in terms of purely practical considerations. Thus, one might decide 
that practical limitations in maintaining records, scoring tests, making assign- 
ments and the like limit one to no more than six different appraisals of the 
individual, and rule that the number of appraisals shall be six. One would then 
set out to delimit six clusters in such a way that within the six a maximum of 
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Average Within-Cluster Distance for Different Numbers of Clusters 
(Data for three synthetic examples built around four foci) 


compactness resulted. (The correlative result is that there is a maximum of 
variance between the centroids of the clusters.) One might then apply multiple 
discriminant analysis to one’s test battery to find test weights which would 
maximally differentiate the clusters. 


At this point I can sense the bubbling up of doubts and questions: “But 
what about your units?” ... ““How can you decide what dimensions to use?” 
... ‘What about the error variance in the location of any single specimen?” . . . 
“What has all this got to do with the organization of psychological associa- 
tions?” 

I can do no better than emulate the good Dr. Tenib. Is time to go home. 
Sleep on question. Maybe tomorrow you give me answers. 
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IMAGE THEORY FOR THE STRUCTURE OF QUANTITATIVE 
VARIATES* 


Louis GUTTMAN 
THE ISRAEL INSTITUTE OF APPLIED SOCIAL RESEARCH 


A universe of infinitely many quantitative variables is considered, 
from which a sample of n variables is arbitrarily selected. Only linear least- 
squares regressions are considered, based on an infinitely large population of in- 
dividuals or respondents. In the sample of variables, the predicted value of a 
variable x from the a n — 1 variables is called the partial image of z, 
and the error of prediction is called the partial anti-image of x. The predicted 
value of x from the entire universe, or the limit of its partial images as n >=, 
is called the total image of z, and the corresponding error is called the total 
anti-image. Images and anti-images can be used to explain “why” any two 
variables x; and 2; are correlated with each other, or to reveal the structure of 
the intercorrelations of the sample and of the universe. It is demonstrated that 
image theory is related to common-factor theory but has greater generality 
than common-factor theory, being able to deal with structures other than 
those describable in a Spearman-Thurstone factor space. A universal comput- 
ing procedure is suggested, based upon the inverse of the correlation matrix. 


1. The Multiple-Correlation Approach to the Notion of ““Commonness”’ 


There are two ways in which it is conventional to try to explain “why” 
statistical variables are intercorrelated. One is based on multiple correlation 
and the other on partial correlation. 

The partial-correlation approach has been utilized to develop a theory 
to explain all intercorrelations simultaneously within a set of variates, 
namely, the theory of common factors. Spearman’s celebrated hypothesis 
was that mental tests were intercorrelated because they had a single general 
factor in common; if this factor were partialed out, no correlations would 
remain. The generalization to multiple common factors by Spearman, 
Thurstone, and others remains a partial-correlation approach. If m variables 
can be found such that when they are partialed out the observed inter-test 
correlations vanish, then these m variables are said to constitute a set of 
m common factors, and to represent what the original tests have in common 
(4). 

Common-factor theory is still beset with several different kinds of 

*This paper introduces one of three new structural theories, each of which generalizes 
common-factor analysis in a different direction. Nodular theory extends common-factor 
analysis to qualitative data and to data with curvilinear regressions (6). Order-factor theory 
introduces the notions of order among the observed variables and of separable factors (7). 
The present image theory is relevant also to the other two. 

Attention may be called to empirical results published since this paper was written: 
Louis Guttman, “Two new approaches to factor analysis,’ Annual Technical Report on 


contract Nonr 731 (00). The present research was aided by an uncommitted grant-in-aid 
from the Ford Foundation. 
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problems of indeterminacy (among them the problems of communalities, 
of rotation of axes, and of estimating factor scores) arising from the fact 
that the m variables to be partialed out are hypothetical in the first instance. 
Many controversies exist as to how to make these variables concrete, and 
many scientists are sceptical of the validity of the basic premises. 

It is interesting that hitherto only the partial-correlation approach— 
using controversial hypothetical variables—has been used for a structural 
analysis of a set of variates, despite the fact that the more concrete notions 
involved in the multiple-correlation approach seem older and more widely 
accepted. Apparently no systematic attempt has been made previously to 
capitalize on the structural possibilities of the multiple-correlation approach. 
Such an investigation is the purpose of the present paper. 

We shall show how the intercorrelations within a set of variables all 
can be simultaneously interpreted or explained by means of their mutual 
multiple regressions, the latter determining, in a certain unambiguous manner, 
what the observations have in common. 

We treat here the case of quantitative variables with linear least-squares 
regressions. Elsewhere we shall treat qualitative cases [as in (5)]. 

For the multiple-correlation approach, we need introduce no hypothetical 
variables. If we are given a set of n observed variables x; , we can consider 
directly the multiple regression of each variable on all the remaining n — 1 
variables. If r; is the resulting coefficient of multiple correlation for 2, , 
then traditionally r; has been called the ‘‘index* of determination” of x; 
from the remaining variables, and 1 — r; the “coefficient of alienation.”’ 

Indeed, r; represents the proportion of the total variance of x; that is 
dependent on the remaining variables, and in a real sense expresses how 
much z; has in common with other variables. If r; = 0, then x; has nothing 
in common with the other variables; in fact, it also then correlates zero 
with each separately. If r; = 1, then z; is linearly dependent on the remaining 
variables, so that whatever could be done with 2, could be done as well 
without it; the remaining variables contain all the relevant information for 
any problem. Values of r; intermediate between zero and unity, then, express 
intermediate degrees of commonness between 2x; and all the remaining 
variables. 

This can be seen further by studying the classical normal equations from 
which one computes the multiple regression coefficients. According to these 
equations, we break x; up into two parts, say p; and e,; , where p; is the 
predicted value and e; is the error of prediction: 


“= pte. 


*Although there is no standard usage in the literature, we shall systematically use the 
word “index” to refer to the square of a correlation coefficient, to distinguish the square 
from the coefficient itself. 
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The equations state in particular that e; correlates zero with p; : the prediction 
and the error of prediction are uncorrelated. But more important for us, the 
equations state more generally that e; correlates zero with each predictor 
x, separately, or 


T.:2, = 0; (j ¥ k). 


Thus z; is broken up into two parts. One part, p,; , is perfectly dependent 
on the remaining n — 1 variables; the other part, e; , correlates zero with 
each and every one of the n — 1 predictors, and hence with any possible 
(linear) combination of them. The multiple correlation of p; on the predictors 
is perfect; the multiple correlation of e; on the predictors is zero. 

Multiple correlation gives us this simple and profound property of break- 
ing each variable into two parts, one of which is determined entirely by the 
remaining variables, and the other of which has no relation with the remaining 
variables. 

The study of the common and alien parts of the observed variates, 
as defined by multiple correlation, we propose to call zmage analysis, a name 
suggested by the n-dimensional geometry of the situation (11). 

Paradoxically, the alien parts can play a role in the observed inter-test 
correlations, which is one of the major points analyzed in the present paper, 

especially in §8 below. Indeed, in a sense, the “alien” parts are more basic 
than the ‘“‘common’’ parts, as shown in the final §11 below. 


2. Relationship to Common-Factor Theory 


It is of interest to inquire as to what relationship image analysis has to 
common-factor analysis in the Spearman-Thurstone sense. It turns out 
that image analysis is the more basic and inclusive approach; it includes 
common-factor theory as a special case. That this might be so could possibly 
be surmised by considering the respective properties of partial correlation and 
multiple correlation. A partial correlation coefficient in general can either 
increase or decrease as the number of variables eliminated increases; but 
common-factor theory is concerned only with a special kind of circumstance 
wherein partial correlations tend only to zero. On the other hand, a multiple 
correlation can never decrease as the number of predictors increases; in 
general, the correlation increases. This nondecreasing property is all that is 
required by image theory; so no restrictions at all are involved, and the theory 
is universally appropriate. 

Because of its universality, image theory throws considerable light on 
common-factor theory, as well as on order-factor theory (7) and on any 
other special theory. It shows under what special circumstances a universe 
of data admits of a common-factor structure at all, regardless of the number 
of common factors. This we shall see in the present paper. In a later paper, 
we shall see how image theory explains why the problem of communalities 
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has not been solved yet in the Spearman-Thurstone theory, and how a 
universal solution is impossible; it will also be shown there how misleading 
present computing routines can sometimes be that are based on “extracting” 
common factors (10). 

A new, universal computing routine will be suggested that will help 
distinguish for a given set of data whether they have a finite common-factor 
structure (no matter what the finite number of common factors may be), 
an order-factor structure (simplex, circumplex, etc.), or some other kind of 
structure. 


3. The Observed Correlation Matrix 


We are concerned with a universe of content* of indefinitely many quanti- 
tative variables, which is defined in advance of any statistical analysis. 
We assume that each variable has a finite population variance, and that all 
regressions are linear.t 

It is essential to distinguish between the universe and any finite samples 
of n variables that may be selected from it. We assume nothing about random 
sampling of variables but that we can arrange the universe in an arbitrary 
order in which our particular sample will be the first n variables. 

With respect to the population of individuals observed, it too is assumed 
indefinitely large. We are not concerned here with samples of people; through- 
out we treat only population parameters. 

If x; denotes the jth variable from the universe, then let x;; be the 
score of person 7 on this variable. As usual, we can set the population mean 
of each variable equal to zero, and the variance equal to unity. Thus, 


Ez;,-:: =0, Exj, +++ =1, (j = 1, 2, -+>), (1) 


where the notation / denotes the expected or mean value over the infinite 


population of individuals. Then the population correlation coefficient, r,, , 
between any two observed variables is simply their covariance, 


ah E 2X; Xxi ; (j,k = 1,2, ---). (2) 


If we are dealing with a finite number of n variables, then the values 
r;, can be expressed as a Gramian matrix of order n which we shall denote 
by R, , 
R, = || ru ll; CG, % = 1,2, «++ ,n). (3) 
The entries in the main diagonal of R, are each unity, according to (2) and 
(1), indicating the total self-correlations. 


*A term originating in the context of scale analysis, but appropriate more generally. 
tPeculiarly, the theorems below do not depend on the true regressions; these may be 
curvilinear. Of course, the meaning of our results is fullest if the true regressions actually 
are linear; and even more, if the zero correlations imply complete statistical independence. 
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As more variables from the universe are added to the initial set of n 
in (3), nothing happens to the initial entries r;, except that more rows and 
columns surround them. An observed correlation coefficient as in (2) between 
any two variables is not a function of n; it does not depend on which other 
variables are in the set. Therefore, if we inquire what happens to R, in the 
limit as n — ©, we can state that there always is a limiting matrix, which 
we shall denote by R.. , 

R. = lim R, . (4) 
R.. is an infinite Gramian matrix, and represents the correlations between 
all variables in the infinite universe of content. 


4. The Inverse of the Correlation Matrix and Its Problem of Limits 


The inverse of the observed correlation matrix plays a central role in 
our analysis. We shall usually assume that, for a given set of n variables, 
R,, is nonsingular and possesses an inverse. This will be true, for instance, 
if all observed variables are experimentally independent and have retest 
reliabilities less than unity. The assumption of nonsingularity is usually 
correct in practice. 

The inverse of R,, will be denoted as usual by R;". In contrast to the 
elements of R, , the elements of R;* are functions of n and change as addi- 
tional variables are added to the set. As is well known, the elements of R;* 
can be expressed in terms of minor determinants of R, . Let 


A” = the determinant of R, , 





and let 
A‘? = the cofactor of r;, in R, . 
Then: 
A® | 
R;' = p= |: (5) 


A” clearly varies as n varies; and for fixed j and k, A‘? also varies with n. 


The sample of n variables is studied in order to yield inferences about 
the universe of content. We must ask what will happen if we increase the 
size of the sample. If nothing definite happens in the limit as n > o, then 
surely we cannot infer much about the universe, and any structural theory 
we may have will be unfounded. The differences among finite common- 
factor structures, order-factor structures, and other kinds of structures will 
be seen to depend largely on what happens in the limit of Rj' as n > ©. 

First we note that, in dealing with infinite matrices, the algebra of 
finite matrices need not at all hold. If there is a definite limit to R;*, then in 
general it is not the same as the inverse of F.. ; that is, in general, 

Rz' = lim R-', (6) 


no 
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even if both members exist. Indeed, the right member of (6) may exist and 
hence be uniquely defined; but at the same time Rz' need not exist, or alter- 
natively R>' may represent more than one matrix. Even if R;' converges to 
something definite, we have no assurance that there exists a matrix Rz' 
such that R.R;' = J. . Even if such an R;z' exists, it may be only a right 
inverse and not a left inverse, so that R.'R. * I. ; or there may be more 
than one such inverse to R. . These are paradoxes of infinity. For finite 
matrices, right and left inverses always are identical and unique. 

The importance of inequality (6) is illustrated by common-factor theory. 
A number of years ago it was proved that the foundations of the Spearman- 
Thurstone approach rest essentially on the proposition that inequality (6) 
holds; in particular that lim,... R,' exists and is a diagonal matrix (2) or 
that nondiagonal elements vanish in the limit, 


A™ 
lim —* = 0; (j ~ k). (7) 


Such a diagonal matrix clearly cannot be an inverse for R. . This hitherto 
little-noticed theorem has most practical consequences, for it provides an 
entirely new way of testing empirical data for the possible existence of common 
actors. Given an observed matrix R,, , compute R;' and see if the nondiagonal 
elements are all close to zero. Such a criterion requires no preliminary deter- 
mination of ““communalities’ nor “‘fitting”’ of factor loadings, nor specification 
of the number of common factors. If the criterion (7) is not satisfied, then it 
is usually futile to attempt to “fit”? any common-factor space of finite rank 
to the data. Image theory will enable us to improve on and to generalize this 
criterion, as is shown in §11 below. 

One example where criterion (7) is not satisfied can be shown to be the 
simplex matrix, where the correlations have the law of formation, 


as = a,b, ’ (j < k), 


a, and b; being two certain parameters belonging to 2; . It is futile to attempt 
to find any finite number of common factors for such a matrix as n > ©. 
Actually, a much simpler theory than that of finite common factors holds (7). 

So much for what is for the moment a digression, to emphasize the 
importance of proving the possible existence of any quantities we may want to 
hypothesize. It is not to be regarded merely as a matter of mathematical 
pedantry. 


5. Partial Images and Total Images 


A sample of n variables from the universe of content defines a partial 
image for each variable, namely, its predicted values from the remaining 

















LOUIS GUTTMAN 283 


n — 1 variables. The limit of the partial images of x; as n becomes infinite 
will be called the total image of x; in the universe of content. Let p;? denote 


the predicted value of x;; from the remaining n — 1 variables in the sample, 
and let p;;? denote the limit as n > ~: 


(0) 


pis = lim Dit. (8) 


n7@ 


We assume for the moment that the limit in (8) exists; in a later paper (9) 
we shall examine this assumption. Then p};’ is the partial image score for 
person 7 for variable x; , and p;;’ is his total image score for 2; . 

It is well-known how to compute p}; from the observed data. If we 
let w;;? denote the weight of x, in the multiple regression for predicting 2; , 


then (cf. 1, 306), 
(n) —Ay 


we Gj # 4k), (9) 





provided the denominator on the right does not vanish. Notice that (9) 
does not define a value for 7 = k, for this would imply a weight for predicting 
the test from itself. It will be convenient, however, apparently to include 
the test itself in its own regression by the artifice of giving it a regression 
weight of zero. So we define the value for 7 = k as follows: 


w;; = 0, (10) 


for all 7 and n. With this artifice, we can express the partial image scores 
of the x; in the following convenient form, 


a ee a (11) 
k=1 
The total image scores of x; are then, from (8) and (11), 


pe = lim > wx, . (12) 
no k=] 

In the right member of (12), not only the number of terms being summed 
depends on n, but also the values of the terms themselves, for the regression 
weights w;;’ depend on all the variables used as pgedictors. 

Formula (9) holds for the regression weights, provided the denominator 
A‘ does not vanish. If R, is nonsingular, the denominator can never vanish 
for any j; but if #, is singular, (9) may not hold and the regression weights 
may not be uniquely defined. Regardless, however, the partial image scores 
are always uniquely defined, whether F#,, is singular or nonsingular. This is 
well-known, but it may be well to restate here the fundamentals involved. 
This we do in the next section. 
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6. The Fundamental Equation For Least-Squares Images. 


Consider the weights w/;’ in (11) as unknowns to be solved for—except 
for the self-weights which are always zero as in (10). Let e;; be the errors of 


estimate according to the prediction (11), so that 


55 = Di + e;?. (13) 
Let the variance of the errors for x; be denoted by a3; , 
oi, = Ele? P. (14) 


Then we wish to determine the w;;’ so as to minimize (14). Differentiating 


the right member of (14) with respect to the w‘?, using relations (13) and 
(11), shows that a necessary and sufficient condition for attaining a minimum 
is that the following fundamental equation holds: 


E ej? ans = 0, Gj # k), (15) 


or that the errors be uncorrelated with each predictor separately. Fundamental 
equation (15) expresses the classical normal equations of linear least squares. 
There is a unique minimum to (14), obtained by unique values of e$’, and 
hence of p;;. If R, is singular, it may be that the z,,; in (11) are linearly 
dependent in such a fashion that more than one set of w}?? will yield the same 
best prediction p;?, but the best prediction itself is uniquely determined 
regardless. If 2, is nonsingular, then also the best w;;’ are uniquely deter- 
mined by the data, namely by formula (9). 

More generally, then, we can regard (15) as our basic equation for 
determining the w}’—uniquely or not. Equation (15) is the basic equation 
of image analysis, from which all other results follow. Together with definitions 
(11) and (13), equation (15) uniquely determines the partial images, and 
invests them with all their subsequent meaning and properties. 

The errors of prediction from the partial images play a prominent role 
in our theory. We shall call the e;"” the partial anti-image scores of x; . Then, 


(eo) 


parallel] to (12), the total anti-image scores will be denoted by e;;’ and 


ej; = lime}?, (16) 
assuming the limit on the right exists. 
One immediate consequence of basic equation (15) is that the partial 
image and anti-image of each z; correlate zero with each other, 
E ej? pj? = 0. (17) 


+ 


This well-known result follows by multiplying each member of (11) by 
e;”’, taking expectations over 7 and using (15) and (10). 
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Let p;, be the multiple correlation coefficient of x; on the remaining 
n — 1 variables. Since the variance of x; is unity, then—as is well-known— 


pin is also the standard deviation of the 7:7’, or, 


pin = E [p;? (18) 


A well-known consequence of (17) is, then, that 
Pin + oj = 1. (19) 


It is p;, that has traditionally been called the ‘index of determination,” 
and o;, the “index of alienation.” To avoid possible notions of determination 
in the sense of causation, and to use a more convenient terminology for our 
purposes, we shall call p;,, the partial norm of x; , and o;,, the partial antinorm. 

Geometrically, the n variables x; can be described as unit vectors with 
a common origin, defining an n-dimensional Euclidean space. A correlation 
r;, is the cosine of the angle between x; and x, . The image variable of z; is 
then represented by the projection of the vector x; on the (n — 1)-dimensional 
space defined by the remaining vectors; and p;, is the cosine of the angle 
between x; and its projection, as well as being the length of the projection 
vector. o;, is the distance between the termini of the vectors of xz; and its 
projection, as well as being the cosine of one of the angles involved. 

It is interesting that this geometry of image theory was known long 
before the advent of common-factor theory, which uses a similar geometry 
(ef. 11). 

A norm, then, is the square of the length of a test vector’s projection; 
and an antinorm is the square of the distance between the termini of a vector 
and its projection. Equation (19) expresses the Pythagorean theorem for 
the right triangle defined by the vector of x; and its projection or image. 

The total norm of x; will be defined as the limit of its partial norms, 
and will be denoted by p3~ , 

Pio = lim pj, . (20) 


no 


A similar definition holds for the total antinorm, denoted by oj. , 


gy. = lim oj, . (21) 


no 


Obviously, if the limits in (20) and (21) exist, then from (19), 
Pio + ojo = 1. (22) 


That total norms and antinorms always exist is easily established. 
It is well-known that a multiple correlation coefficient cannot decrease as 
more variables are added to the regression. Therefore, for each j, pj, , is a 
monotonically increasing function of n. Being bounded above by unity, it 
follows that there must exist a limit to p;, as n > ©, by the usual theorem 
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on bounded monotone functions. Similarly, o;, always has a limit asn > o~. 
These results we shall state as: 


THEOREM 1: Total norms and antinorms, p;. and o;. , always exist for 
each variable x; in an infinite universe of content (where each x; has unit variance). 


Further problems of existence of limits—with respect to individual 
image scores and parameters associated with them—will be treated in a 
separate paper (9). 


7. The Fundamental Identity for Least-Squares Images 


The purpose of any structural analysis is to provide a framework for 
comprehending the interrelationships among observations. Our present 
problem is to “explain” the correlation coefficients 7r;, . For this purpose, 
image analysis has a universally applicable ‘explanation,’ as stated in the 
following fundamental theorem: 


THEOREM 2: The correlation between any two different observed variables 
(with unit variances) from a given set of n variables is the difference between 
the covariance of their respective partial images and the covariance of their 
respective partial anti-images. That is, if we let g\? and v5, be the covariance 


between the partial images and anti-images, respectively, for x; and x, , or, 


(n) 7 (n)_ (n) ¢ 
gix = E pi Dei (23) 
i 
and 
(n) _(n)_ (n) ‘ 
72 = y ji Cx: ’ (24) 


1 


then the following identity always holds: 
Te=Ge —Vie3 (GH). (25) 
To establish the theorem, we first multiply both members of (13) by 
2,; , take expectations over i—remembering (2)—and arrive at 


7 Kn). (n) 
5k = E Pii Li fe E ii Ls ° (26) 
i Y 


Now the second term on the right vanishes for 7 # k, according to (15). 
Hence (26) becomes 


JS) aad E pji’ tes ; (j # k). (27) 


The left member is symmetric in j and k. Hence we can rewrite the right 
member by interchanging j and k without altering the result: 


tn = Epi 23 (j # k). (28) 
¢ 
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Equations (28) and (27) state that zx; has the same covariance with 
x, as with the partial image of x, , and vice versa. 
(n) 


Multiply both members of (13) by p,;', take expectations over 7, and 
use definition (23): 
E pia: = 9 + Epes. (29) 
Multiply both members of (13) by e;’, take expectations over 7, and use 
definition (24): 


(n) (n)_ (n) (n) 
Beit); = Epjieki + Vik - (30) 
ry i 


Now the left member of (30) vanishes for 7 # k, according to (15). 
Hence, from (30), 
(n) (n) ,(n) 


—Vn =~ Edie: = E pe; (j ~ k). (31) 
‘ ‘ 


The last member is obtained from the middle member by interchanging sub 
scripts j and k, which is permissible by virtue of the symmetry of the first 
member. 

Using (31) in (29), and then (29) in (28) establishes that (25) is correct, 
or Theorem 2 holds. 

It should be remarked that there are no assumptions whatsoever in 
establishing equation (25); zt 7s a universal identity. We have not used here 
the assumption that R,, is nonsingular, or that the w; are uniquely defined 
as by (9). Only the basic normal equations (15) have been used, which assure 
unique values for images and anti-images even when R, is singular and (9) 
does not hold. 


8. Interpretation of the Fundamental Identity 


According to identity (25), any correlation coefficient can be regarded 
as the difference between two covariances, one from the common parts of the 
two variables and the other from the alien parts. 

Students of common-factor theory may be puzzled at first by the fact 
that the alien parts should be correlated and affect the observed correlation 
r;, . They are accustomed to the notion of “specific”? or “unique” parts 
which are mutually uncorrelated and do not affect the r;, , (j # k). They 
may be tempted to take the point of view that the y;;’ represent “errors of 
fit’’ of the image covariances g;;,’ to the observed correlations r;, . We shall 
see that such a point of view is correct, provided a finite common-factor space 
really exists (regardless of what dimensionality) for the entire universe as 
n> @, 

But we shall also see that such a point of view is very specialized. Let 
us first take the most general view of the situation, and then we shall see 
how various specializations can occur. 
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The fundamental equation (15) which led to the fundamental identity 
(25) states that the anti-image of x; is orthogonal to (uncorrelated with) 
each of the remaining x, . But, paradoxically, this anti-image is not necessarily 
orthogonal to the anti-images of the z, ; y};’ is not necessarily zero for any 
pair of subscripts. An anti-image is always orthogonal to a total predictor, but 
not necessarily to parts of that predictor. 

It is indeed peculiar that e;” should always be orthogonal to x, , (j ¥ k), 
but not necessarily to e,” or to p,”. It will seem less peculiar if we examine 
the meaning of 7‘? more closely. We shall show now that y;; is intimately 
related to the partial correlation between x; and x, , holding constant the re- 
maining n — 2 variables. 

In order to avoid details unnecessary to the main argument, let us 
assume R, to be nonsingular, so that we can use formula (9) for the various 
regression weights, as well as further convenient determinantal formulas. 
Let z;; denote the partial correlation between z; and x, , eliminating the 
n — 2 remaining variables. This means that x; and 2, first are predicted 
separately from the n — 2 remaining variables (where now 2; is not used in 
the regression for x, , nor x, in the regression for x; , so that different weights 
are involved from those of the respective partial-image regressions on n — 1 
variables) and then the resulting errors of prediction are correlated (1) to 
define the partial correlation coefficient 7;;’. The well-known determinantal 
formula for this partial correlation is (1) 

git an an : G#k) (32) 
= ; a 

Now if we multiply both members of (11) by e/?, take expectations 

over 7, and use (31), (15), and (10), we have 


—y1 = Wit E Deir} (j € I). (33) 
But 


(n) 2 
E203; = Cia, (34) 
i 


as can be seen by multiplying both members of (13) by e}?’, taking expecta- 
tions over 7 and using (17). Hence, from (33) and (34), revising the sub- 


scripts, we have 


—Yip = Wie oin | (j ~ k). (35) 
The determinantal formula for a, is (1) 
: A™ 
oi, = 36 
” 7 a 


Therefore, using (36) and (9) in (35) we arrive at the determinantal formula 
for the covariance between any two anti-images, 
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(n) 4 (n) 
(nm) __ Aix A m Gj os k). (37) 


| ae AM Am ’ 
For our purposes, a more striking and important way of expressing J; is 
arrived at by using (36) and (32) in (37) to obtain: 
Vie = Tie Oo inFkn } (j # k). (38) 


Identity (38) shows precisely how 7/7 is related to r;’ . This identity affords 
an explanation for the paradox of possibly correlated alien parts, which we 
shall state here as a theorem. 





TueEorEM 3. Jf R, is nonsingular, and if ps", is the correlation between 


partial anti-images e;” and e\” , then this anti-image intercorrelation is equal 
to the negative of the corresponding partial-correlation: 


Pun = —te; (i xk). 


The covariance y$? vanishes if and only if +5; vanishes (j ¥ k). 
(n) 


This theorem follows directly from (38), by recognizing that p32, = 
x5. /oinm from the usual product-moment formula for a correlation coeffi- 
cient. 

That oes should be equal and opposite in sign to 7‘? is “obvious” from 
the geometric picture involved (cf. 11). +$? is the cosine of the angle, say 8, 
between two hyperplanes, while p{", is the cosine of the angle between per- 
pendiculars to these two hyperplanes, or of an angle equal to 180° — @. 
Theorem 3 thus boils down to be a special version of the trigonometric identity 
that cos (180° — 6) = —cos @. 

According to Theorem 3, after we have subtracted out the common 
part—the partial image—from x; , the alien part that remains behaves 
toward x, almost as if x, were not in the regression for predicting x; . Sub- 
tracting out the common-parts of the variables still leaves room for pairwise 
linkages to remain between them of the kind described by their partial 
correlations. 

We can now interpret our fundamental identity (25) by rewriting it, 
using (38), as 

ri. = git + T5E0 jnO kn } (j =k). (39) 
According to (39), an observed total correlation r;, can be regarded as arising 
from two sources: (a) the covariance between the common parts of the two 
variables, and (b) a special pairwise linkage that may remain between the 
two variables after the remaining n — 2 variables are partialed out. 


9. Comparison with Common-Factor Theory 


The possible pairwise linkages in identity (39) are of profound importance 
for structural analysis. Different patterns of these linkages give rise to different 
kinds of order-factor theories. For the theory of mental activity, these 
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linkages make possible some hypotheses as to the physiological workings of 
the nervous system (7). 

The Spearman-Thurstone common-factor theory is a special—indeed, 
degenerate—type which specifies zero pairwise linkages. [More generally, it 
is an orderless theory, which is one reason why the problem of rotation of 
axes arises (7).] 

In common-factor theory, it is hypothesized that each 2z;; can be ex- 
pressed as the sum of a common part, say c;; , and a unique part, say u;; , 


Ljii = Cj + Uji (40) 
where the rank of c;; is of basic importance. If the rank is m, then there are 
m common factors—expressible with unit variances—say y, (f = 1,2, ---,m), 
such that 

= z Aj sYsi y (41) 

f=1 


where a;; are weights for the common factors. It is assumed that the unique 
parts are orthogonal to the common parts and are also orthogonal to each 
other, 
Eu;ti: = 0, (42) 
and , 

E u; tu; = 0; (j # k). (43) 


Hypothesis (42) holds for 7 = k as well asj ¥ k; for7j = k it is analogous to 
identity (17). If (42) and (41) hold, it easily follows that each u; is orthogonal 
to each common factor y, separately, which is the more traditional way of 
presenting the hypotheses of common-factor theory. We are not concerned 
here with the y, separately, however, and do not hypothesize anything 
special about them as to whether they are orthogonal to each other or not, 
nor how they are to be located within the common-factor space. It is the 
common-factor space as a whole that is of present concern, and this is repre- 
sented by the c; . The c; are invariant under any nonsingular transformation 
of the y, . 

In particular, the variance o*, of c; is an invariant of the common-factor 
space, and is called the communality of x; (12). Similarly oi, , the variance 
of u; , is an invariant and is called the uniqueness of x; . Furthermore, the 
total variance of x; , taken as unity, is the sum of the variances of its common 


and unique parts: 


a, to, = 1. (44) 


Equation (44) parallels identity (19), with the communality playing the 
role of the partial norm and the uniqueness the role of the partial antinorm. 
A further parallel to an identity of image theory is the equation 


E Uj ies = 0; (j # k). (45) 
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That (45) follows from the common-factor hypotheses can be seen by multi- 
plying both members of (40) by uw; , taking expectations over 7, using (42) 
and (43), and revising subscripts. Equation (45) is parallel to the fundamental 
equation (15) of image analysis. Indeed, (45) can be used as a starting point 
for the common-factor hypotheses in place of (42). Equation (42) can be 
derived directly from (45) and (44) for 7 ¥ k; and if m < n, it can also be 
seen that (42) holds for 7 = k. Starting with (45) in place of (42) will be a 
convenient way for us to compare common-factor theory with image theory. 

The way common-factor theory “explains” the observed intercorrela- 
tions r;, is by means of its fundamental factor equation, 


UF) ieee E ¢; Ck: ) Jj ~ k), (46) 


which follows from (40), (42), and (43). Common-factor analysts traditionally 
expand the right member of (46) in terms of some set of y, , using (41), but 
this is irrelevant to the present discussion. 

We can now summarize and compare the bases and consequences of 
image theory and of common-factor theory as in Table 1. 


TABLE 1 


Comparison of Characteristics of Image Theory and Common-Factor Theory 











Image Theory Common-Factor Theory 
asic Partiti ses eal (n) dea 
Basic Partition Lig = Die + ef 23; = Cj. + U;; 
n 
Basic Definitio CO) 2 z. (n) 
Re = Wik Lei 
k=1 


(a): Ben. =0; GAH Bum = 0; GB 


Basic Restrictions 


(b): ee E u;itk; = 0; (j # k) 

[ (a): e+ Eujtes = 0; (7 ¥ &) 

(b): E ey p;? = 0 Eu;.c;; = 0; (m <n) 
J (0): pin + ain = 1 oe, +o, =1 


Consequences 
dd): ry. = gin — VR} G#k) re = E Ci iCis ; (9 #k) 


, i, Ste (n) 
(e): ty = git + Tik FjnFkn 





(9 # k) 
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10. The Special Case of Determininate Common Factors 


According to Table 1, common-factor theory lacks a basic definition 
for its common parts c;; , and has two restrictions on the deviant parts 
compared to only one restriction for image theory. The single restriction of 
image theory can always be satisfied, making the basic definition unique, so 
that the consequences are all identities or tautologies—they are universally 
true. In contrast, the restrictions of common-factor theory do not generally 
suffice to define any particular partition of the observations; more than one 
common-factor partition can be found in general—with different u;, and 
c;,—to satisfy the restrictions. For example, if a set of c;; of rank m can be 
found to satisfy the restrictions on the u;; , then certainly a set of rank 
m + 1 can also be found, yielding new u,;; which also satisfy the restrictions. 
Or more than one set of c;; can usually be found with the same rank m. 
Two different sets satisfying the restrictions cannot be obtained from each 
other by rotations within one of their common-factor spaces, for any set 
c;; is invariant under rotations. 

This highlights one of the basic problems of indeterminacy of common- 
factor theory. More than one total common-factor space can satisfy the 
same data. (To repeat, this problem of indeterminacy is entirely different 
from that of rotation of axes, which takes place within a given common- 
factor space.) 

This indeterminacy can be removed by introducing the notation of a 
determinate common-factor space. Such a space is one in which there is a 
perfect regression for each common factor y, on the observed z;. For finite 
n, a common-factor space is in general indeterminate; the common-factor 
scores can only be estimated from the x; , with positive variances of errors 
of estimate. As n increases, the errors of estimate decrease; if the limit of 
the errors of estimate is zero, then the common factors are perfectly deter- 
minate in the limit. General conditions under which a common-factor space 
is determinate have been established elsewhere (2); essentially they are 
that there exist a limit to R;', which is a diagonal matrix. 

Instead of dealing with separate common-factors y, here, let us define 
the determinateness of a total common-factor space of rank m as follows. 
Let b;;’ be the regression coefficient of x, for predicting c; in the multiple 
regression of c; on n observed variables. Then the common-factor space of 
the c; will be called determinate if and only if, for all j*, 


c;; = lim > bi? ,, . (47) 
1 


n7@D hw 


Condition (47) can be seen to be parallel to our definition of total images 
in (12). If we now fill in condition (47) as a basic definition in the table of 
the previous section, it easily follows that only basic restriction (a) in the 


*Except possibly for a zero proportion of the population. 
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table is now needed to prove that* 
C= Diss (48) 


or the common parts of images and of common-factor theory are identical in 
the limit*. For the proof of (48), consider the quantity 6;, defined by 


Sin a E [e;; — pir — E [uss — es. es (49) 


That the last member equals the middle member follows directly from the 
basic partitions. Expanding the last member shows that 


2 2 (n) 2 4 
Sin = eas 2 E U; Cis a Ojn- (50) 
‘ 


To evaluate the middle term on the right, multiply both members of (13) 
by u;; and take expectations over 7, remembering (11), (10), and (45), whence 


(n) 2 K 
E U; Ci: = E U; 5%; 5 = Ou; e (51) 
‘ i 


That the last member of (51) is equal to the middle member is well-known 
and can be seen by multiplying (40) through by u,;; , taking expectations, 
and using (42). Using (51) in (50) shows that 


Sin = Cin a Ou; — O°; = Pin ’ (52) 
the last member following from the middle member by virtue of consequences 
(19) and (44). 

Since the first member of (52) is nonnegative, the last member provides 
us with a new proof of a previously known theorem that a communality of 
x; 7s always an upper bound to the square of the multiple correlation coefficient 
or partial norm of x; (2, 92-93.) For a given set of n variables, more than one 
set of communalities can exist, but in all cases these communalities cannot 
be less than the corresponding—uniquely defined—partial norms. The 
closer a communality of x; to the partial norm of x; , then—according to 
(49) and (52)—the closer the c;; to the p‘”’, and the closer the u;,; to the es). 

Now, the c,;; and u;,—hence also o2, and o.,—do not depend on n. It 
is assumed that there is a fixed number m of common factors in the entire 
universe of content, and these will appear in any sample of n variables where 
n > m. In contrast, the image partition changes with n, with the partial 
norms always increasing (or at worst remaining constant). Our Theorem 1 
above states that limiting or total norms always exist. We can now state 
further what the limiting values are if (47) holds; for it has been shown in 
(2) that if (47) holds, then also 


2 a4 
Oc, = Pix ’ (53) 


*Except possibly for a zero proportion of the population. 
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or any communality equals the total norm. Taking the limit in (52) asn — © 
and using (53) shows that 
lim 6), = 0, (54) 


n-@ 


which then establishes (48). These results can be stated as: 


THEOREM 4: Jf a common-factor space of rank m is determinate for an 
infinitely large universe of content, then there is no other determinate common- 
factor space possible for the same universe—whether of rank m or of any other 
rank. The communalities are uniquely determined and are equal to the correspond- 
ing total norms. The common-factor scores are the total image scores, and the 
unique factor scores are the total anti-image scores. 


We have now completed the demonstration of how common-factor 
theory is a special case of image theory. For a common-factor theory to be 
useful, it should be determinate; otherwise there is no uniquely defined 
common-factor space; and furthermore, common-factor scores cannot be 
estimated closely for use in practice. It the theory is determinate, it becomes 
a special case of image theory, with the special restriction that total anti- 
images are uncorrelated. 


11. A Universal Computing Procedure 


The fundamental identity (25) provides a computing procedure to 
analyze the structure of the interrelationships of n variables from any uni- 
verse of content. Let I’, be the Gramian matrix of the covariances 7;;’, 
and let S* be the diagonal matrix defined by the partial antinorms. These 
matrices can both be computed easily by first computing R;', according 
to (5) and (36). Then, considering also (37), we have the working formula, 


r, = SiR,'S,, (55) 


or I’, is computed from R;' merely by premultiplication and postmultiplica- 
tion with the diagonal matrix S> . 

Once I, is computed, it is easy to compute G, , the Gramian matrix of 
the covariances gj; ; for by referring to (25), we can write 


G,= Rk, + T, — 28. (56) 


Only matric addition is needed to compute G, according to (56). The diagonal 
matrix 2S: has been subtracted in the right of (56) to make the main diagonals 
consistent; (25) does not define the main diagonals, which have to be con- 
sidered separately. 

In the special case of determinate common factors, the nondiagonal 
elements of I’, should all be close to zero, for their limit as n — © is hy- 
pothesized to be zero. 
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If not all nondiagonal elements are close to zero, one is led to reject the 
hypothesis that a determinate common-factor space of rank less than n 
holds for the universe. One could then examine I, to see if any special order 
exists among the nonvanishing pairwise linkages. It is best, of course, to have 
a preliminary order theory before one examines the empirical evidence. Exam- 
ples of preliminary hypotheses are the simplex and circumplex (7). Ultimately, 
special theories may have to be developed for each special kind of content. 

An important paradox is that the anti-image covariances are more 
basic to the structural analysis of R, than are the image covariances. If we 
know I, , we know R,, , for from (55) we have immediately: 


R, = S.Tx'S, , (57) 


This is not the case with G, ; knowing G, is not sufficient for determining R, . 
In this sense, FR, is determined by the alien parts, rather than by the common 
parts. 

Another way of stating this paradox is to express G, itself as a function 
of T., . From (56) and (57), we have 


G, = ST,'S, + T. — 28). (58) 


Given I, , we can compute G,, through (58). The converse is not true; G, by 
itself does not determine I, . In the general case, then, structural theories 
should be based on the anti-images, rather than on the images. 

In the later paper, we shall discuss the general theory of the matrices 
of linear least-squares image analysis and show some further intimate con- 
nections between I, and G, as well as with other matrices that occur naturally 
in the theory (8). 

If a determinate common-factor theory holds, so that [,, tends to a 
diagonal matrix (namely S2), then we have only G, to deal with and no initial 
clues as to its structure. An order-theory may still hold within G, regardless. 
If not, one is up against the problem of rotation of axes that is traditional 
to common-factor theory. But at least the indeterminacies of the com- 
munalities and of the common-factor space have been removed. 

Empirically, it has often been found that multiple correlations tend to 
taper off rapidly as the number of predictors increases. It may often be 
expected in practice, then, that with n greater than 10 or 15, say, partial 
images and norms will be so close to their total images and norms that the 
differences will be negligible, and n can be regarded as virtually “infinite.” 
Some order structures like the circumplex may require a larger number of 
tests than others in order to piece out the necessary details. 

On the other hand, it is well-known that the sampling errors associated 
with multiple regressions can be quite enormous if the sample of people 
used is small. Only large samples of people can be reliably used if n is sub- 
stantial, and this is as it ought to be. Any structural analysis of a universe 
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requires a large sample of people. Small samples may be adequate for re- 
jecting null hypotheses of zero relationships, but they are not adequate for 
estimating the details of involved nonzero relationships. In the future, the 
required sampling theory will undoubtedly be forthcoming which will indicate 
whether 500, 3,000, or some larger sample of people is needed in order ade- 
quately to study the structure of a given universe of content on the basis of 
a sample of 7 variables. 


9 


10. 
t & 


12. 
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A GENERALIZATION OF THURSTONE’S LEARNING FUNCTION* 
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Thurstone’s equation giving the probability of a correct response (p) as a 
function of practice time (t) when punishment and reward have equal effects 
has been generalized to the case where the effect of punishment is not neces- 
sarily equal to the effect of reward. Since the general equation is somewhat 
unwieldy, three special cases are considered, where reward has no effect, where 
punishment has no effect, and where these effects are equal. Equations are 
given together with tables for making a rectified plot for each of the three 
special cases. 


This paper presents a generalization of the learning function developed 
by Thurstone (2) and will discuss certain interesting special cases of this 
general equation. 


Definitions and Assumptions 
Following Thurstone’s development we will define the following variables: 


the strength of the correct response, or the number of unit correct 
responses available to the organism at any moment. 

the strength of the incorrect response, or the number of unit incor- 
rect responses available to the organism at any moment. 

p = the probability of a correct response. 

q = the probability of an incorrect response. 

t = practice time. 


§ 


cS 
lI 


The relationship between p, q, s, e, and ¢ is assumed to be given by the 
following equations: 








8 
Pe ste (1) 
and 
e 
a ete (2) 


*This study was supported in part by contract N6onr 270-20 between the Office of 
Naval Research and Princeton University. The opinions expressed are, of course, those of 
the author and do not represent attitudes or policies of the Office of Naval Research. 
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These two assumptions are identical with assumptions [1] and [2] from 
Thurstone (2). It is also assumed that the variation of s and e with respect 
to time is given by 


ds 
and 
de 
qd ee (4) 


These two assumptions are similar to Thurstone’s assumptions [4] and 
[6] but are more general because it is not assumed that c = k. In Thurstone’s 
development it was assumed that the effect of rewarding the correct response 
(k) was equal to the effect of punishing the incorrect response (c). Here the 
more general case in which these parameters may be either the same or 
different is being considered. 

From the foregoing set of four equations it is possible to derive the 
functional relationship between the probability of a correct response (p) 
and the practice time (é). It might be noted that the assumptions used here 
are essentially the same as those used in a former paper (1) except that 
there the functional relationship between cumulative errors and cumulative 
correct responses was obtained, while here the interest is in the relationship 
between two other variables, practice time (¢) and the probability of a correct 
response (p). 


Derivation of the General Case 


Substituting (2) in (4) gives 





de —ce 
dt ste’ (5) 
If s is expressed as a function of e, and this function substituted in 
(5), (5) will then be a differential equation in e and ¢ only and may be inte- 
grated readily. We can obtain s as a function of e by the following procedure: 
Dividing (3) by (4) gives 


ds _ _kp 
de cq (6) 
Dividing (1) by (2) gives 
me 
“as (7) 


Substituting (7) in (6) we have 


ds ks 
de “ce ®) 
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Integrating (8) and evaluating the constant of integration by noting that 
when ¢ = 0, s = & , and e = &, we obtain 


8 k e 
log 5 aie” log (9) 
Taking antilogarithms of both sides and solving for s gives 
e ~—k/e 
$= o£) . (10) 
Co 


Equation (10) expresses s as a function of e and certain constants. Substituting 
(10) in (5) and separating variables gives 
50 pitte)/-° de 4. de = —c dt. (11) 


—k/e 
€o 


Integrating (11) and evaluating the constant of integration by setting e = é 
when ¢ = 0 gives 


+e nie c 
ia al = —ct + & — 7%. (12) 


Equation (12) gives e as a function of ¢. In order to obtain the functional 
relationship between p and f, it is necessary to obtain e as a function of p 
and substitute in (12). This can be done as follows: Substituting (10) in 
(7) and solving explicitly for e gives 


c/(ce+k) 
os Sen ata ; (13) 


Substituting (13) in (12) and rearranging terms gives the final solution as 
follows: 


c (ey"""" _ — 
k \q Pp 


c/(et+k) k/(e+k) 
ie Hg Or gress a (2) + : (2) : (14) 


Po Yo 
where 
p = the probability of a correct response, 
qu i-#, 
k = the effect of reward, 
c = the effect of punishment, 
Po = the value of p when ¢ = 0, 


QY=1—p, 
€) = the value of e when ¢ = 0, and 
8 = the value of s when ¢ = 0. 
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Differentiating (14) we have 


a) (Ge) + 


—_ — dt. 


Solving explicitly for the derivative and simplifying gives 

dp 4 : (z)"""(ay" 

a e+ pa - a , (15) 
which is always positive and approaches 0 as either p or qg approaches 0. 
The second derivative is 


2 2ce/(c+k) 2k/(c+k) 
TP = + ¥) pa (2) (2) [2e+k—3c+ hp]. (16) 


So €o 





As in the case of the first derivative, the second approaches 0 when p ap- 
proaches 0 or when qg approaches 0. The inflection point is given by the 
remaining solution, 
2c +k 
P= 36 + 3k (17) 

It can be seen that as c and k take different values between 0 and plus 
infinity, the inflection point shifts from p = 1/3, when c = 0, to p = 2/3 
when k = 0. The inflection point is at p = 1/2 whenc = k. It can be seen 
that as long as c and k both remain positive, the inflection point can never 
be lower than p = 1/3 or higher than p = 2/3. Assuming that c and k are 
never negative is equivalent to assuming that reward never decreases the 
strength of the correct response and that punishment never increases the 
strength of the incorrect response. 

Equation (14) gives the general relationship between p and ¢ and allows 
the effect of punishment to be different from the effect of reward. It is an 
equation which is very difficult to fit. Also, the general shape of the curve 
does not alter much for large changes in the values of c and k. As c and k 
vary from 0 to ~ the inflection point shifts only from p = 1/3 to p = 2/3 
as is shown by equation (17). If the inflection point of the empirical curve 
can be rather accurately estimated it might be worth while trying to fit the 
general case. However, three special cases can be handled rather readily, 
and these cases will now be considered. 


Three Special Cases 
Case 1: c = k. The effect of punishment is equal to the effect of reward. 


Case 2:c = 0. Learning is entirely due to reward for the correct response. 


Case 3:k = 0. Learning is due entirely to the punishment of the incorrect 
response. 
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Case 1: c = k. In this case equation (14) becomes 
‘ ‘ 


a> wan t + Zo » (18) 
where 
1/2 1/2 
0-0" 
q Pp 
mM = So ? 
2 (e:)'” (as) 
20 — Lae . 
qo Po 


This is the equation previously developed by Thurstone (2). It can be fitted 
readily in its rectified form using the table of z as given by Thurstone (2), 
which is the same as Table 1*. Taking the first derivative of p with respect 
to ¢ for equation (18) gives 

dp -_ 2k q?p™? (19) 


a Wat” ' 
which is always positive. That is, the probability of a correct response in- 
creases steadily with practice time. The second derivative of p with respect 
to tis 
d’ 6k? 
=~ Pal — 2p). (20) 


Setting (20) equal to zero we find the inflection point is 


p = 1/2. (21) 


It can be seen from equation (18) that as p approaches zero, ¢ approaches 
—o, and as p approaches 1, ¢ approaches +. Equation (24) then repre- 
sents a symmetrical curve asymptotic to p = 0 and p = 1 with an inflection 
point at p = 1/2. 


Case 2: c = 0. Learning is due entirely to the effect of rewarding the 
correct response. For this case equations (1), (2), and (3) remain the same 
as in the general case, while equation (4) becomes 





de 
a” 0, or €=G&. (22) 
Substituting equations (1) and (22) in (3) we have 
ds ks 
dt ste (23) 


*The computing for this and subsequent tables was done by Mrs. Gertrude Diederich. 
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Integrating (23), setting s = s, when ¢ = 0, and substituting function p 
from equation (1) for s gives the solution 


— a = aes Po 
eT he a a ee (24) 


where 


p = the probability of a correct response, 
t = practice time, 

k = the effect of reward, 

€) = the value of e when ¢ = 0, 

Po = the value of p when ¢ = 0, and 


GQ =1— po. 


This equation is in rectified form since the left side is a function of p 
only and the right side is a linear function of ¢. It can be fitted very easily 
by making a rectified plot using the values for function p [the left-hand 
side of equation (24)] given in Table 2. 

Equation (24) may also be obtained by setting c = 0 in equation (14). 
Under these conditions equation (14) becomes indeterminate and can be 
evaluated by methods of the calculus to give equation (24). 

The first derivative of equation (24) is 

PE tt - 2)", (25) 
which is always positive, indicating that p increases continually with practice 
time. The second derivative is 


ot = Soll — 9)" — 3p). (26) 


The second derivative is 0 and the first is positive when 
p = 1/8, (27) 


which is the inflection point of equation (24). It can also be seen that (24) 
is asymptotic to p = 0 and p = 1. That is, according to this equation, when 
learning is dependent on reward only, the inflection point comes at p = 1/3 
and the curve approaches the upper asymptote very slowly. This is equivalent 
to the case for c = 0 previously developed (1, 416), where it is shown that 
the cumulative error curve under these conditions has no upper asymptote. 
That is, the subject’s learning record is approaching zero errors per trial, 
but does so so slowly that the total number of errors made increases without 
limit. The errors constitute a nonconvergent series when the effect of punish- 
ment is negligible. 
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Case 3:k = 0. Learning is dependent entirely on the effect of punishment. 
For this case, equations (1), (2), and (4) remain the same as for the general 
case, while equation (3) becomes 





e = 0, or S§ = 8S. (28) 
Substituting equations (2) and (28) in equation (4) gives 
de —ce 
dt ste (29) 


Integrating equation (29), evaluating the constant of integration by setting 
é€ = é when ¢ = 0, and substituting function q from equation (2) for e gives 
the solution 








a Pp —-?. Po 1 — Po 
23 = lo ~ =-—t+lo - A 30 
3 ig | a! p p So + g 1 = Po Do ( ) 
where 
p = the probability of a correct response, 
t = practice time, 
c = the effect of punishment, 


S = the initial strength of the correct response, and 
po = the value of p when ¢ = 0. 


This equation is in rectified form since the left-hand member is a function 
of p only and the right-hand member is a linear function of ¢. It can be easily 
fitted by using the values for function p [the left-hand side of equation 
(30)] given in Table 3. By plotting these values of function p against ¢, 
it can be seen whether the plot is linear or not, and if it is linear the values 
for the slope and intercept can be determined either by graphically fitting 
a straight line and reading the values from the graph or by more precise 
methods. It may be noted that equation (30) can be obtained from equation 
(24) by substituting =p for q, q for p, and —¢ for ¢. It may also be obtained by 
setting k = 0 in equation (14), and evaluating the resulting indeterminate 
expression. 


The first derivative of (30) is 
d 2 
rae — P); (31) 


which is always positive, indicating that p increases with practice time. 
The second derivative of equation (30) is 


a’ 2 ‘ 
af = al — De — 3p). (32) 
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The second derivative is 0 and the first is positive when 
p = 2/3, (33) 


which is the inflection point of equation (30). It can also be seen that (30) 
is asymptotic to p = 0 and p = 1. That is, according to this equation, when 
learning is dependent on punishment only the inflection point comes at 
p = 2/3 and the curve approaches the upper asymptote very rapidly. This 
equation is similar to that given for the case k = 0 (1, 417). 

It might be noted that the equations given here, equation (14), the 
general form, and also the three special cases, equation (18) for the case 
c = k, equation (24) for the case c = 0, and equation (30) for the case k = 0, 
are identical with those previously derived (1) except for a change in the 
variables. In the former paper the variable (u) representing cumulative 
errors was given as a function of the variable (w), cumulative correct responses. 
This paper gives the probability of a correct response (p) as a function of 
practice time (¢). The equations given in the previous paper can be obtained 
from the corresponding ones given here by substituting u + w for #, 
dw/(du + dw) for p, and making the appropriate rearrangement of terms. 

The general equation [20] in the previous paper corresponds to the 
equation (14) in this paper. The case where c = k, [equation (27) in 2] is 
identical with equation (18) given here and similar, except for the change of 
variables, to equation (30) in (3). The case where c = 0, learning by reward 
only, is given on page 416 of (1) and in equation (24) here. The case where 
k = 0, learning by punishment only, is given on page 417 of (1) and corresponds 
to equation (30) in the present paper. 
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Maximizing the discriminating power of a multiple-score test involves 


maximizing the homogensity of eac subtest and minimizing the correlations 
between su 3 method is presen for constructing such tests from 
items whose intercorrelations are not too high. Under certain restrictions the 
saturation, defined as the proportion of inter-item covariance to total vari- 
ance, is maximized for each subtest. The nucleus of each subtest is three items 
with high covariances inter.se, All items which will lower the saturation are dis- 
carded; the one item is added which will maximize the saturation of the result- 
ant test. This process is repeated until all the items are included or discarded 
for that subtest. If the correlation between any such subtests approaches the 
geomegric mean of their saturations, their items form a new pool for one or 
more subtests. Formulas are presented for deciding which items to eliminate 
in order to reduce further the correlations between subtests. 


I. Some Theoretical Considerations 


For a heterogeneous group of items it is often desirable to develop 
scoring keys such that each key will constitute a homogeneous subtest and 
the keys in conjunction will provide maximum discrimination, i.e., will be 
minimally intercorrelated. To date, no rigorous method has been available 
which handles these two requirements simultaneously. Factor analysis, a 
possible method, has many drawbacks. Aside from the technical difficulties 
in factoring a large pool of items, a major objection is that the basic assump- 
tion that each item score is the weighted sum of several factors does not fit 
in with the practical problem of assigning an item to a subtest on an all-or- 
none basis. Furthermore, the estimation of communalities in order to deter- 
mine the number of factors to extract is no more rigorous than the procedures 
presented here. 

The aim in constructing homogeneous tests may be expressed as maxi- ~ 
mizing the discriminating power of the test, which has three aspects: fineness 
of discrimination, probability of correct discrimination with respect to 
whatever the test measures, and range of discrimination. ‘ 

If one conceives of test construction as adding items one at a time toa - 
small nucleus, drawing from a finite pool of items in order of the goodness of 


the items, then coefficients to measure the goodness of the test can be divided 
*This research was supported in part by the United States Air Force under Contract 
AF 33(038)-10588 with Human Resources Research Center, Lackland Air Force Base, 
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.into three groups. Coefficients which measure primarily the fineness of dis- 


crimination will tend to increase with the addition of items. Coefficients 
which measure primarily the probability of correct discrimination, which is 
either the same as or closely related to the factorial purity of the test, will 
tend to decrease with the addition of items. Coefficients which measure both 
the fineness of discrimination and the probability of correct discrimination 
may increase at first and then decrease. Intuitively, one feels the need for 
such a maximizing function to aid in deciding when to stop adding items. 

Two coefficients previously proposed, Ferguson’s (2) coefficient of test 
discrimination, and Kuder and Richardson’s (4) formula 20 (hereafter 
referred to as KR 20), have this maximizing property. Ferguson’s coefficient 
lacks algebraic properties and has not been related to an explicit system of 
test construction. Our method of test construction is based mainly on the 
saturation coefficient, defined as the ratio of the sum of all the inter-item 
covariances to the total variance of the test. KR 20 is equal to the saturation 
coefficient times n/(n — 1), where n is the number of items. 

The variance of a test may be expanded as a function of the variances 
and covariances of the items: 


n 


Fe ee ee (1) 
t=1 i<j=1 
where V; is the variance of item 7, V, is the variance of the test, and C;,; is 
the covariance of item 7 with item j. The saturation coefficient, S, is 


s-2%c,/(Sv+25 0.) 


t<j=1 i=1 i<j=1 


=(v.- Xv.) /¥. (2) 
i=1 

Maximizing the saturation of a test drawn from a finite pool of items 
will not necessarily maximize the discriminating power of the test except 
under certain conditions. One condition is that the intercorrelations of the 
items not be too high. For tests with very high item intercorrelations, maxi- 
mizing the saturation will definitely not maximize the discriminating power 


(1). 


The second restriction is that the original nucleus be more than two 


items, say, three or four. This insures the test against being too highly di- 


verted in the direction of the unique content of any one or two items. 

The third restriction is that any item excluded from the test at any 
stage shall not be considered for inclusion at a later stage. The purpose of 
this restriction is to prevent “functional drift’ of the test, that is, inclusion 
of items measuring function A, then those measuring functions A and B, 
then those measuring B alone. Apparently this restriction is sufficiently 
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stringent so that items unrelated to the central factor in the test can scarcely 
be included. Without this restriction items having no relation to the original 
nucleus might in some cases be included. The restriction is not strong enough 
to insure that no group factors exist among the items. The existence of group 
factors will raise the saturation but lower the extent to which the saturation 


(or, more properly, KR 20) can be considered a measure of the proportion . 


of first factor variance. 

The discriminating power of a multiple-score test has one more aspect 
than the discriminating power of a single test, i.e., the degree of independence 
of the subtests. In this connection the Jackson and Ferguson (3) derivation 
of KR 20 shows that KR 20 is equal to the correlation between two tests 
which have the same mean inter-item covariance, when the mean covariance 
between two tests is equal to the mean covariance within each. On the basis 
of this relationship, the upper limit of the correlation between two tests 
should be approximately the geometric mean of their saturations. When 
two or more tests are found whose intercorrelations are almost equal to their 
saturations, those tests are considered a new pool of items and subtests are 
again constructed beginning with new nuclei. In the application of the 
Jackson-Ferguson relationship, the difference between KR 20 and the satura- 
tion coefficient is of little importance, as the ratio of the two coefficients is 
almost one, and attention is paid to the order of magnitude rather than the 
exact value of the saturation. 

After the most highly saturated tests are constituted from the several 
pools of items, and after the most highly related tests are reconstituted or 
combined, there remain several possibilities for attenuating the discriminating 
power. An item may have been omitted because it did not fall in the original 
pool of items from which the test was drawn. An item may be included in a 
test even though it is equally or more closely related to another test. The 
discriminating power of the test can be increased by adding some items to 
subtests and dropping others. The aim is to make the intercorrelations low 
rather than exactly zero, since the latter is generally not possible without 
sacrificing test saturation. 

Increase in the rigor of the present method of test construction might 
lie in the direction of evaluating the difference between the proportion of 
first-factor variance and the proportion of common-factor variance. KR 20 
is an upper limit of the former and at least an approximate lower limit of 
the latter. The smaller the difference between the two, the purer the test. 


II. Method 


For the present method items were either given as dichotomous or 
reduced to dichotomous form. There were not many items with very high 
intercorrelations. Since the sampling errors involved were known only roughly, 
a large number of cases was required. The use of exactly 1000 cases saves 


fe 
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many hours of labor, since all divisions by N may be accomplished by shifting 
the decimal place. Cross-validation data from one study appear to indicate 
that useful results might be obtained with as few as 300 cases. 

The method was originally devised for constructing homogeneous keys 
for a biographical inventory; however, it can be used as well on other types 
of data, such as interest tests or multiphasic personality tests. The method 
can be used for the discovery of traits or of types of people; there appear to 
be no assumptions which limit it in this respect. 

Ideally the method should be used with the matrix of the covariances of 
every item with every other one. With large pools of items, there are mechani- 
cal complications in obtaining and handling such a matrix. The present 
cycling method was evolved to handle large numbers of items without com- 
puting the complete matrix of covariances. 

The first step is reading the test and formulation of hypotheses as to 
possible interrelations of the items. Items are then grouped according to 
these hypotheses and apparent similarity of content. 


Maximizing the Saturation of a Test 


The procedure for maximizing the saturation of a test is as follows: 
From the matrix of inter-item covariances of a given group of items, the 
triplet of items with highest covariances inter se is chosen as a nucleus. 
These three items comprise a test. All items are discarded from consideration 
which would lower the saturation of that three-item test. The one item is 
added which will maximize the saturation of the resultant four-item test. 
Then all remaining items which would lower the saturation of that four-item 
test are discarded, and the one is added which will maximize the saturation 
of the resultant five-item test, and so on. The process terminates when all 
items are either included in the test or excluded from the pool. 

In order to maximize the saturation one need only maximize a simpler 
quantity, 


W, = » Cu / ~ V;, (3) 
i<ji=l1 t=1 

in which the subscript ¢ on the ratio W means that it is a property of the 

test, and the prescript 7 refers to the number of items in the test. The quantity 

»W, , which might be called the “covariance ratio,” changes every time an 

item is added to the test. 

The proof that maximizing ,W, will maximize the saturation is simple. 
The saturation is a quantity of the form 2C/(V + 2C), where the capitals 
without subscripts are used to designate the sums rather than the elements 
of the sum. To maximize the saturation one needs only to minimize its 
reciprocal, (V + 2C)/2C = (V/2C) + 1. As constants may be disregarded, 
one needs to minimize V/C, or maximize C/V. 

The next step is to find a criterion for the exclusion of items. Let us 
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define a ratio ,W, characterizing each item k not included in the test: 


nW — Zo Caf V; ’ (4) 


t=1 
where the subscript k indicates that the W is a property of the item k, the 
prescript means that there are n items in the test, k not being one of the 
first n items in the test. It can be shown that an item k will not lower the 
saturation of the test if 
We 2 We (5) 


The proof of this statement is as follows. One wishes to find the property 
of that item k which, when added to the test, will not lower the W, ratio. 
This condition may be expressed: 

atiW, = .W,. 


substituting from equation (3), we have 
(ot.+ Se. ns S¥.)> Se. / E7. 
Since all variances are positive, we may multiply by the denominators 
without changing the sign of the inequality. 
Br Bee Se.)> Helv. + Tv). 
Cancelling like terms and dividing again by the variance terms yields 
$e./ 4s Se./ Sv.. 

Thus the inequality expressed in formula (5) is established. The same proof 
may be used to show that item & will lower the saturation if ,.W, < ,W, . 

Worksheets for constructing tests by the present method are shown 


in Tables 1 and 2, which must be constructed simultaneously. The right 
side of Table 1 consists of a table of covariances for items included in the test. 


TABLE 1 
Synthesis of Test Statistics: A Sample Table 











nW, ZV; V; Item 117 110 124 95 
.2447 109 .0850 .0775 .0309 .0482 
.2483 117 -0642 .0418 .0320 

-319 .7113 .2183 110 s2aC = 2267 -0395 0371 

-411 .8237 .1124 124 we = 3389 .0197 


-447 1.0639 .2402 95 522 C = 4759 
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After the original nucleus of items is chosen, the first three covariances 
are entered on the right side of Table 1. Their sum is entered in the (3, 3) 
cell of the principal diagonal. The variances of the first three items are entered 
in the first column to the left of the vertical item identification, and the sum 
of the first three variances is entered in the next column leftward. The first 
test covariance ratio is entered in the leftmost column; it is equal to the ratio 
of the sum of the first three covariances to the sum of the first three variances. 

At this stage it is convenient to have Table 2 drawn up but no entries 


TABLE 2 
Poo] of Items: A Sample Table 











Item 59 69 70 95 96 124 
V; . 2485 . 1957 . 2481 . 2402 .2491 .1124 
32 C .0882 .0645 .0738 .1174 .0767 .1122 
3W; .355 .330 .297 .489 .308 .998 
Trial 4W; .328 .321 out .362 out .411 
s2C .0987 .0721 ee 5 oy er in 
Wy .397 5 ee MWA eee Geass 
out Gat. - ausen a ee re 





made in it. For each item in turn the quantity ;W, is now computed. If the 
covariance ratio for the item exceeds the covariance ratio for the test, then 
the identifying symbol of the item is entered in the first row, its variance is 
entered in the same column, second row, and the sum of its covariances 
with the first three items is entered in the same column, third row. This step 
is completed for the entire original matrix of items. Most of the items will be 
rejected at this step and thus will not appear in either table. 

The next step is to compute a trial ,W, for each item in Table 2. The 
trial ,W, is equal to the sum of covariances of the test plus the sum of co- 
variances of the item, divided by the sum of variances for the test plus the 
item variance. The values for the test are found in Table 1, the corresponding 
values for the item are found in the appropriate column of Table 2. 

The item which has the highest trial ,W, is selected as the fourth test 
item. Its covariances with the three items already in the test are entered in 
the right side of Table 1, and its variance is entered in the column of Table 1 
labelled V,; . The three covariances just entered in the table are now added 
to the previous total, found in cell (3, 3), and the new total is entered in 
cell (4, 4). The new sum of variances is obtained by adding the new variance 
to the previous sum of variances. The new test covariance ratio, ,W, , is 
obtained by dividing the sum of covariances by the sum of variances. The 
value obtained should check exactly with the corresponding value in the 
“Trial ,W,’’ row of Table 2. It will be convenient to draw a heavy line down 
the column of Table 2 corresponding to the item selected for the test. 

For each item a new sum of covariances is obtained by adding its co- 
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variance with the fourth item to its previous sum of covariances. The values 
are entered in the row of Table 2 labelled ,>_C. The sum of covariances 
for each item is divided by its variance. These covariance ratios need not be 
recorded, but for those items where the ratio is less than the test covariance 
ratio, an indication must be made that the item no longer is in the pool. 
For those items remaining in the pool, a trial ;W, is computed, and so on. 

The possibility exists that when a test has been fully constituted, some 
items added early in the process may have ceased to contribute to the satura- 
tion. In order to test for this possibility, one may compute for each item the 
covariance ratio for that item with the test minus that item. If this ratio is 
less than the final ,W, of the test, that item no longer contributes to the 
saturation of the test. The condition for excluding an item which has been 
included in a test may be expressed: 


Pe/ << Gu) Bi. (6) 


t=1 i<j=1 
The proof is identical with that of formula (5) above. 


Construction of the Multiple-Score Test 


Cycle I keys are evolved from the a priori matrices by the method 
described above. After one key is constructed from a matrix, the entire 
original matrix is utilized in constructing further keys. It was thought de- 
sirable at first to exclude those items in the first key from consideration for 
later keys, but this course probably is disadvantageous. An item which is 
drawn into the first key as one of the last items may more properly appear __ 
as one of the first items of a second key. It would then probably belong with 
the second key. Items closely related to both keys may often best be omitted 
from both, since they tend to raise the correlation between the two keys. _l 

All items which are not included in any key are placed in a residual 
matrix. The residual matrix is treated the same way that the a priori matrices 
are treated; that is, the covariances of all items are obtained and the total 
matrix examined for new keys. The keys derived from the a priori matrices 
plus those derived from the residual matrix now constitute Cycle I keys. 
Cycle I keys are scored and correlated. The matrix of intercorrelations of 
Cycle I keys is examined for high values, say, values above .25 or .30. These 
are the correlations which must be reduced in order to have relatively inde- 
pendent tests, and insofar as possible this reduction must take place without 
impairing the saturation of the tests. 

If there are two or more keys which have correlations inter se approaching 
in magnitude their saturations, all of the items are placed in a new pool 
from which Cycle IA keys are constructed to replace the corresponding 
Cycle I keys. There may be two or more such groups of closely related keys. 
Each group of keys is, of course, treated separately. Cycle IA keys are con- 
structed by the method used for Cycle I tests. Cycle IA keys are now scored 
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and correlated with each other and with those of the Cycle I keys retained 
without change. 

The next step is to obtain the point biserial correlation of every key, 
i.e., the Cycle IA keys plus the Cycle I keys that were not replaced in Cycle 
IA, with every item in the original pool. These correlations comprise a matrix 
with one column for each key and one row for each item in the pool. In 
general, it is necessary to apply a correction to the point biserial between 
the item and its own key to compensate for the spurious correlation. In 
practice, for many items the outcome will by inspection either be so high or 
so low that the actual computations need not be made. The formula for this 
correction is as follows: 








, = TirOr — 0; (7) 
4(T-i) ™ 

Vor — 270.07 + o} 

where r;,r-;) is the corrected point biserial, r;7 is the uncorrected point 
biserial, and o, and or represent standard deviations of item and key. A 
useful approximation is given by: 


Teri) Sir — o:/or . (8) 


The matrix of point biserials is utilized to drop items from or add items 
to keys, primarily to lower the correlations between keys, but in some cases 
also to raise the saturation. There are three major considerations in examining 
this matrix. The first is that every item should have its highest correlation 
with its own key. Items with fairly equal correlation with two or more keys 
are often best omitted entirely, since they are the items which raise the cor- 
relations between keys. Occasionally one will find that key A and key B 
will be positively correlated but that item 7 will enter A in a positive sense 
and enter B in a negative sense. In this case inclusion of the item in both 
keys acts to lower the correlation between them. 

The second consideration is that some items not included in any Cycle I 
key may have a high correlation with just one of those keys. This will occur 
only when the item was not included in the matrix from which that test was 
drawn. Care must be taken not to add items to a key if those items will 
raise correlations which are already high. 

The third consideration is to lower high correlations between keys. 
For every pair of keys having a high correlation, say over .25, every item 
in both tests should be examined to see if it has a fairly high correlation 
with the key in which it is not included. Of course any items included in 
both keys would be dropped from one or both. In dropping items care should 
be taken not to deplete any test to the point where its saturation falls below 
.30 as & minimum. 

When the complete matrix of covariances is available, the correlations 
between any key and any other or between a key and any item can quickly 
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be recomputed after each deletion or addition of an item to the key. When the 
complete matrix of covariances is not available, the most practicable pro- 
cedure is to make whichever few changes for each test are most clearly 
indicated. After such changes the new tests are called the Cycle II tests. 
These tests are scored and correlated, and the biserial correlation of each 
test with each item is again obtained. The same considerations are applied 
to obtain Cycle III tests, and so on. The process terminates automatically 
when there are no further changes. 

The following formulas are useful in carrying out the cycling process. 
If item 7 is not included in either test 7, or test T, , then adding 7 to 7, 
will not raise the correlation between 7, and T, if 


rir./(rir, + o;/2o7,) <1rry,7, + (9) 


If item 7 is included in 7, , then the correlation between 7, and T, 
will be lowered by dropping 7 from 7, if 


rit,/(Tir, — o:/20¢m) = Pry,7, - (10) 

The test ratio of formula (4) can be obtained from the point biserial 
correlation by means of the following formula: 

~W; =Ti107r/c; . (11) 
Unity must be subtracted from the right-hand side if the item 7 is included 
in the test 7. This formula enables one to determine whether a given item 
will lower the saturation of a key when the item was not included in the 
matrix from which the key was drawn. 

When the complete matrix of item covariances is available, it appears 
to be advantageous to begin by picking out all of the nuclei and to construct 
the subtests simultaneously. Each subtest will begin with a nucleus of three 
items, then a fourth will be added to each, then a fifth, and so on. When this 
method is followed, an item which is used in one test is not considered for 
others. Working from the complete matrix of covariances is probably more 
economical than following a cycling procedure in most cases. Machine 
techniques for handling large matrices of covariances will be presented in a 
later paper. 
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SOME APPLICATIONS OF ATTRIBUTE PREDICTION THEORY 
TO PSYCHOPHYSICS 


Norman C. PERRY 
SAN JOSE STATE COLLEGE 


The following development is a coordination of certain aspects of mental 
test methods and psychophysics. The biserial and triserial prediction equations 
of the Katzell-Cureton theory are utilized to reformulate the determination of 
absolute and differential limens from data produced by the constant method. 


Introduction 


Katzell and Cureton (4) describe a method for predicting the probability 
of an individual’s falling in one of two categories of a continuous dependent 
variable (y) from a knowledge of his score on an independent test variable 
X, as for example in the prediction of pilot success as a function of stanine 
test score. A solution to the converse of this problem (i.e., obtaining X as a 
function of a preassigned probability of inclusion in one of two categories) 
was developed by Guilford and Michael (2). The theory, as carried out 
in these two papers, has a natural application to the psychophysical method 
of constant stimuli. 

An extension to three categories of the entire theory was fully developed 
by Michael and Perry (5), the principal new feature yielded by the generali- 
zation being that for a given probability of inclusion in the middle category 
of three, there correspond two values of X. This is intuitively plausible 
since, in terms of an academic example, a student may be so bright that he 
has only one chance in four of getting a C in a course, or he may have so 
low an I.Q. that he has only one chan@ in four of getting a C. The essential 
features of three-category methods are easily applied to the psychophysical 
method of constant stimulus difference. 

The entire line of development will be readily seen to relate other minor 
aspects of psychophysics to test theory. 

The mathematical essence of the Katzell-Cureton approach is to set 
up a least-squares equation relating y to X, on the assumptions of linearity 
of regression of the dichotomized variable on the independent variable, and 
normality and homoscedasticity of column array. It is further assumed 
that y (before dichotomization) is reasonably approximated by a unit normal 
variable. The correlation between y and X is, of course, a biserial one denoted 
by the usual symbol 7, . 
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For prediction of y from X, therefore, we have 
Tr 
y= — (X — Md, (1) 


with standard error of estimate given by the equation 
Oye = y¥VI— KH. (2) 


These equations yield the predicted (mean) value and standard de- 
viation of y for any given value of X. The probability of an individual’s 
membership in a category is now determined by finding the proportions of 
area of the y-array which lie above or below the point of dichotomy. The 
point of dichotomy is expressed by the standard score of the point of division 
of the marginal distribution of y, a value which will be denoted hereafter 
by z, . 

Specifically, in order to ascertain the likelihood of membership in either 
category for the X value under consideration, one obtains the difference 
2, — yz, and divides this difference by the standard deviation of the array 
to obtain the new standard score value 


zi! seus zy — Yo (3) 
Oyz 
Finally, a table of normal curve areas applied to 2)’ yields the required 
probabilities. 

The central concept involved in applying these equations to psycho- 
physics is to let X be the stimulus value, and y be the response value. Several 
immediately resulting minor syntheses of psychophysics and test theory 
should be pointed out. Such an approach unifies the ogive assumption in 
mental test theory of the probability of item success as a function of ability, 
with the phi-gamma hypothesis in the method of constant stimuli. Both 
of these follow immediately from the linear regression and normality of 
column array assumed in the mathenf@tical theory just outlined. In addition, 
the quantity 7, can be interpreted as a measure of the sensitivity of sensory 
discrimination for a given observer. 

Traditionally, the ability to discriminate in the psychophysical context 
has often been thought of as the reciprocal of the constant of proportionality 
K in Weber’s Law. For example, an individual who can detect a difference 
as small as 2% in lifted weights is much more sensitive than one who can 
detect only a difference of 10%. The former’s power of differentiation (as 
measured by 1/K) is to the latter’s as 50 is to 10. However, as an alternative 
rationale, a sensitive observer may be thought of as one who, on different 
occasions, tends to respond in nearly the same way to equal stimuli. Psy- 
chophysical data collected from the reports of such an observer would yield 
a regression of y on X with small column standard deviations. Thus, in 
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general, high sensitivity would be associated with large r, , and low sensitivity 
with a reduced correlation. 


Application of Two-Category Prediction to the Method 
of Constant Stimuli 


In Guilford (1, 168) we find psychophysical data relating X (millimeters 
separation of an aesthesiometer) to p (the proportion of 100 trials in which a 
two-point experience was felt). In terms of these symbols the data are sum- 
marized below: 


Aum: we s 
p 93 .66 .29 .05 .01 


In Guilford and Michael (2) we find the converse of equation (1) developed as 
X=M,+%o,. (4) 
Tp 


For the purpose of illustrating the principle of how an equation developed 
for the purpose of computing cutting-scores on a test can be used in psy- 
chophysics we apply equation (4) to the aesthesiometer data. 

It is readily calculated that M, = 10, and o, = 1.414. From the data 
the total of 194 = 93 + 66 + 29 + 5 + 1 indicates that of 500 applications 
of the stimuli 38.8% gave rise to the two-point experience. Thus z, , as the 
standard-score value yielding a normal curve tail of .388, is equal to .2845. 
Finally, 7, , as calculated from the standard formula (M, — M,)/c, p/y, 
is equal to .905. 

Now, from psychophysical principles the limen value of X is that cor- 
responding to the point of dichotomy z, through equation (4), because 
such an X value causes the probabilities of 1- or 2-point experience to be the 
same. We have in test theory the corresponding “‘principle of equal likelihood” 
which defines, for example, a critical cutting point in terms of stanine score 
as one which allows a candidate to have a .5 probability of being a successful 
pilot. 

From these considerations, we see that the desired stimulus limen is 
obtained from substitution into equation (4) as X = 10 + .2845/.905 
(1.414) =10.444. This value differs by about .2 from the limens obtained 
by the traditional process which are all in the neighborhood of 10.6. 


Three-Category Prediction Applied to the Method 
of Constant Stimulus Differences. 

The three-category theory developed by Michael and Perry (5) produces 
the following pair of equations for critical cutting scores which are analogous 
to equation (4): 

X,=M,+™% (@=10r2). (5) 


, 
Ver 
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Here z,, is the standard score of the point of division on the marginal dis- 
tribution of y separating the upper category from the middle one, and similarly 
z,, separates the lowest category from the upper two. Correspondingly, 
X, is the cutting score assuring an individual a .5 probability of membership 
in the upper category, and a score of X, assures an individual a .5 probability 
of inclusion in the lowest category. r,, is triserial correlation between y and 
X of the type developed by Jaspen (3). 

In Guilford (1, 187) we find psychophysical data relating X (weight in 
grams) to p, g, and s (the respective proportions of 100 trials in which a given 
weight is judged greater than, equal to, and less than 200 grams). In terms 
of these symbols the data are summarized below: 


X 18 199 19 200 25 210 215 
D 05 12 .15 80 .55 .70  .85 
q 04 18 .2 .42 85 .18 .09 
$ 91 .70 .60 .28 .10 .12 .06 


It has been found in experiments of this type that the proportions p and s 
tend to approximate ogive functions of X, a finding in agreement with the 
previously discussed mathematical rationale of the ogive form in psycho- 
physics and test theory. It has also been established empirically that the 
proportion qg when graphed against X tends to plot as a normal curve. A 
reasonable explanation in terms of the mathematical foundation assumed is 
achieved if one notes the geometrical fact that two lines a fixed distance 
apart and perpendicular to the base of a normal curve will, when swept 
from left to right, cut off a succession of areas which are a symmetrical and 
roughly normal function of base line distance. 

From the data it is readily computed that M, = 200, z,, = .2829, 
and o, = 10. From Jaspen’s formula (3) we compute r,, to be .760, a statistic 
which can reasonably be interpreted as a measure of the sensitivity of the 
observer in a manner paralleling the previous development for 7, . 

Hence 


(.283)(10) 
.760 


In psychophysical terminology the upper difference limen DL, = 3.72 on 
the assumption that the “point of subjective equality” is quite close to 200. 
These results compare reasonably well with the DL,’s obtained by traditional 
methods which range from 3.4 to 4.3. A process completely analogous to the 
one demonstrated would permit the computation of the lower difference 
limen DL, . 


X, = 200 + = 203.72. 


The Method of Average Error and the Difference Limen 


Many workers in psychophysics have found that the probable error of 
observations as found in the method of average error is roughly proportional 
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to the DL as determined by other methods. The following development 
gives a mathematical rationale for this empirical fact in terms of category 
prediction theory. 

From the concluding example of the previous section it is clear that 
with appropriate assumptions the DL is equal to z,c./r,, . To express the 
S.E. of response in stimulus units one only needs to substitute y, = 1 in 
the regression equation y, = (1,:/02)(X — M,) and solve for z = X — M,. 
Thus we have 1 = (r,,/o,) x, and 8.E. = o,/r,,. Hence P.E. = .67450,/r;, . 

Let us now consider two different observers responding with different 
sensitivity to the same stimulus situation and denote the various mathe- 
matical symbols describing their response by prime and double-prime nota- 
tion. We have then 


ef I The - 6745 o./rTi, _ .6745 0, /rit 
8 «|e 2, O2/Tx a i 








Upon cancellation we have z,’ = z,’’. 

Thus to account for the proportionality found by experiment one need 
make only the intuitively plausible assumption that relative to the dis- 
tribution of response for each observer the standard score of the point of 
dichotomy is always the same. 


An Approximate Method of Correcting the DL for Non-Homoscedasticity 


As pointed out in Woodworth (6) the empirical justification of Weber’s 
Law does not depend on defining the DL in terms of a .5 probability. Other 
probabilities (if not too extreme) give equally good results. However, for 
any probability other than .5 equation (5) must be more generally stated, 
because the mean of the column array no longer coincides with the point of 
dichotomy of the marginal distribution of y. 

As previously described and as summarized in equations (1), (2), and 
(3) Katzell and Cureton presented a solution to the problem of determining 
the probability of placement in a category for a given X value. They did 
not, however, undertake the converse problem. 

From equation (8), 

2)! = ome J 
Tyz 
it is possible to derive a value of X which will permit the prediction of 
category membership at any desired level of probability. By substituting 
(r,/o,)(X — M,) for y, [through use of equation (1)], and o,+/1 — r2 for 
o,, [through use of equation (2)], equation (3) may be rewritten after some 
manipulation as 








om 2yo2 — r(X na M.,) 


a4 = - 6 
oA " 
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By means of a few algebraic steps equation (6) may be transformed to 
give X as a function of z)’ as follows: 


2 
20, Vi-n Oz 
aS na. , 


Tr Tb 


X = M,+ (7) 

In the present setting z;’ is determined by the probability used to define 
the difference limen. It will be noted that for p = .5, z/’ = 0, and equation 
(7) is essentially the same as equation (5). 

In test theory the assumption of homoscedasticity is a plausible one, 
but in psychophysics Weber’s Law implies that column variance is less 
for small stimuli than for large stimuli. To obtain a modified form of equation 
(7), more in line with these considerations, we multiply the column standard 
deviation o,, by the factor X/M, . Thus we have the new equation, 


a= Vi- re (X/M,)o, . (8) 


We now follow through the formal steps used in developing equation (7) 
employing 7/1 — ri, (X/M,) in o,, instead of 1 — rf , and solve for X, 
obtaining after lengthy algebra the new equations, 


X; = (Min, + 2,,02M.)/(Miris + V1 — ris 025’); (i = lor 2). (9) 


In the development of (9) it is necessary to assume that the redistribu- 
tion of column variance introduced by equation (8) (i.e., less variance in 
columns below the mean and more column variance above the mean) does 
not influence seriously the properties of the regression line and the marginal 
distribution of y used in developing equation (7). Hence equation (9) can 
give approximate results only, and is intended merely to give some estimate 
of the influence of Weber’s Law on cutting points. 

With these understandings let us now compare the results yielded by 
equations (7) and (9) when applied to the weight-lifting data previously 
presented, and using an upper limen based on a probability of .7 of inclusion 
in the “greater” category. The value .7 was the limen probability used by 
Binet for mental age, and we choose this value to accentuate the parallel 
between test and psychophysical theory. 

For p = .7, z}’ = —.5244, and substituting this value and previously 
computed statistics into equation (9) we obtain 


(200)?(.760) + (.2829)(10)(200) 








= = 208.4. 
(200)(.760) + V1 — (.760) (10)(— .5244) 
A corresponding use of equation (7) yields 
2 
X = 2004 (.2829)(10) " V1 — (.760)" (10)(.5244) _ 208.2. 


-760 


.760 
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Apparently, then, the estimated correction for the influence of Weber’s 
Law is roughly represented by the discrepancy between a DL, of 8.4 and 
one of 8.2. The smallness of this difference (.2) is probably caused by the 
relatively small range of weights used (8 grams) compared to the average 
weight (200 grams). 
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NOTES ON AN APPROXIMATION METHOD FOR FITTING 
PARABOLIC EQUATIONS TO EXPERIMENTAL DATA* 


A. CHAPANIS 
THE JOHNS HOPKINS UNIVERSITY 


When a numerical transformation of raw data is used only to simplify 
the arithmetic of curve fitting, the transformation may lead to undesirable and 
even highly distorted results. This principle is illustrated with an approxima- 
tion method of fitting parabolic equations to experimental data, as described 
recently in texts by Johnson and Levi. Although the approximation method 
will never yield as good fits as the exact, least-squares method, satisfactory 
results are in general achieved whenever the transformed scores yield a linear 
plot as a function of X. The principal difficulty with the method is that some 
data which fall along a parabola may not vield a linear plot of the transformed 
— X, and so cannot be fitted satisfactorily »y the approximation 
method. 


Introduction 


The method of least squares is commonly used for fitting empirical 
and theoretical curves to experimental data. An important feature of this 
method is that it defines the “‘best-fitting”’ line for a set of data as that line 
which minimizes the sum of the squared differences between observed and 
calculated values. Many problems of curve fitting also involve the use of 
some sort of numerical transformation of the data, the most common trans- 
formation in psychology, perhaps, being a logarithmic one. There are, how- 
ever, some important consequences of using numerical transformations in 
combination with the method of least squares for curve-fitting problems. 
These are pointed out briefly in an article by Mueller (3), but are generally 
ignored in most statistics textbooks written for psychologists. This note 
illustrates one of these consequences with a practical example. 

Briefly the issue is this: When data are first transformed and then 
treated by the method of least squares, the sum of the squared differences 
between observed and calculated values is no longer a minimum. The double 
numerical treatment minimizes the sum of the squared differences between 
observed and calculated transformed values. Sometimes—as in many psycho- 
physical problems—this is precisely the result the experimenter wants to 

*This study was done in cooperation with the Systems Division, Nava] Research 
Laboratory, under Contract N5-ori-166, Task Order I, between the Office of Naval Research 
and The Johns Hopkins University. This is Report No. 166-I-156, Project Designation 
No. NR-507-470, under that contract. The author is indebted to Dr. Hermann von Schelling, 
of the Nava] Medical Research Laboratory, U. S. Naval Submarine Base, New London, 


Connecticut, for technical advice. Miss Judith T. Parker and Mr. William T. Pollock 
assisted capably in the tedious computations required for this note. 
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achieve. In many visual problems, for example, variability is constant with 
respect not to arithmetic values of the stimuli, but rather to their logarithmic 
transforms. In such instances, it is meaningful to use the transformed scores 
directly in curve fitting. In other instances, however, the numerical trans- 
formation may have no significance other than to simplify the arithmetic 
involved in curve fitting. Under these circumstances, the use of a numerical 
transformation may lead to undesirable or even to highly distorted results. 
It is this point which will be illustrated in this note. 


The Least-Squares Solution for a Parabolic Equation 


Let us consider the case of an investigator who wants to fit a parabolic 
equation to a set of experimental points. Given a set of N points defined 
by the rectangular coordinates, X, Y, the desired equation is of the form 


Y’ = aX’?+ bX +6, (1) 


where Y’ is the value of Y predicted from equation (1), X is the abscissa 
value of the several empirical points, and a, b, and c are constants to be 
determined. 

The least-squares solution for the constants, a, b, and c, is straightfor- 
ward but tedious because it involves three simultaneous equations 


> XY =a Dd X*+b > X+e DX 
DS XV =a Dd X44) > K+e>D Xp, (2) 
+ Y=a > X*?+b DO X+eN 


and the computation of seven sums: >.X’Y, ).XY, DY, >-X*, )oX', 
ba 4 7, and >_X, some of which make use of higher powers of X. (See any 
of a number of textbooks, e.g., Peters and Van Voorhis [4, 429-431], tor 
arithmetic details.) An important feature of this solution, however, is that 
it minimizes the sum of the squared differences between the observed and 
predicted Y-values, ie., )\(Y — Y’)’ is a minimum. This statement may 
be paraphrased for emphasis as follows: Any other solution for the three 
constants, a, b, and c, must of necessity yield a poorer fit than the least- 
squares solution outlined above. For this reason, we shall refer to the least- 
squares solution as the exact method throughout the rest of this note. 
Recently Lewis, in his excellent manual Quantitative Methods in Psy- 
chology, has described an approximation method for fitting parabolic equations 
(2, 77-79). In his new textbook Johnson (1, 124-127) also makes use of the 
same transformation as Lewis does, although Johnson, unlike Lewis, combines 
the transformation with the method of averages and not the method of least 
squares. However, the essential criticisms which this note makes about 
the technique hold for Johnson’s use of the method as well. For most practical 
problems, the technique is a good one, and it has two important advantages 
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to commend it: (1) it is much simpler to use than the exact technique dis- 
cussed above, and (2) it provides a “reduction test” for the empirical data. 
Unfortunately, neither author points out that the method is only an approzi- 
mation, and neither discusses the exact least-squares solution described 
above. This note shows that under certain special circumstances, the “re- 
duction test’’ aspect of the approximation will fail with the result that the 
technique will yield equations which fit parabolic data very poorly or not at all. 


A Description of the Approximation Method 
First, let us look at the essential features of Lewis’ solution. We want 
to fit an equation of the type 
Y” =dxX’+eX+f. (3) 
This is the same equation as (1) above, but we have replaced Y’ by Y”, and 
the constants, a, b, and c, by d, e, and f, in order to keep Lewis’ method 


distinct from the exact least-squares method. Now choose any point, X,, Y;, 
from the experimental data and form the equation 





Y, = dXi+eX, +f. (4) 
Subtract this expression from the immediately preceding one, thus 
Y"” — Y, = d(X’ — X}) + eX — X)). (5) 
This can be rewritten 
yr—Y, ,(%°-X) 
a... Se (6) 
or 
| didnt sf 
Yu, = @+ 4X) + aX. (7) 


In this last expression, e and dX, are constants and so may be replaced by 
the single constant g. In addition, let us define 
YY’ —Y 
| Ee ae ee ees ce) 
Z'= 7_r (8) 
Thus equation (7) reduces to 
Z’ = 9+ 4x, (9) 


the equation for a straight line. 

How does one fit parabolas with this technique? First, we start with a set 
of empirical points, X, Y. Choose any convenient point, X, , Y; , and form, for 
each set of experimental coordinates, the value Z = (Y — Y,)/(X — X,). 
Now we have pairs of coordinates, X, Z. Use the least-squares solution for 
a straight line to find the constants g and d. Solve for e by the equation 


e=g-—aXx,, (10) 
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and for f by the equation 


Equation (11) comes from the identity in equation (4). 

Note, incidentally, that if the X, , Y, selected is an experimental point, 
the least-squares solution for a straight line uses N — 1 instead of N points, 
because when X = X, , Z is indeterminate. As suggested later by Lewis 
(2, 80), however, X, , Y,; may be a point selected from a curve drawn free- 
hand through the data. In this case, of course, there are N points available 
for the least-squares solution. 

The simplicity of the approximation method comes principally from the 
fact that it involves only the two simultaneous equations required for a 


straight line, 
oe (12) 
DZ=dPX+gNn 


and the computation of only four sums, )\XZ, >-Z, >.X’, and >>X, none 
of which makes use of a power higher than 2. 

It is readily apparent, however, that the approximation can never be 
as good as the exact solution, because ).(Y — Y’’)® can never be as small as 
>(Y — Y’)?. The reason, of course, is that the approximation minimizes 
not >.(Y — Y”)’, but rather >>(Z — Z’)?. From equation (8) it is apparent 
that Z is a complex numerical transformation involving a ratio, two variables, 
and two constants. Some important features of the approximation technique 
are the following: 


(1) In general, there will always be a discrepancy between the approxi- 
mate and the exact parabola. However, if the N empirical points lie exactly 
on a parabola, the approximation yields the true solution. It follows that 
the approximation is better, the closer the given points fall around a true 
parabola. 

(2) The approximate parabola always goes through the point X, , Y, . 
That this is so is apparent from equations (3) and (4). When X is equal to X, , 
Y” is equal to Y, . Thus, the closer the point, X, , Y, , is to the best parabola, 
the better is the approximation. Conversely, the more distant the point, 
X,, Y, , from the best parabola, the poorer is the approximation. 

(3) The approximation overweights points close to X, , Y, and under- 
weights points distant from it. The weights are proportional to 1/| X — X, |. 
For this reason, X, should generally be selected from the center of the X-range. 

(4) Although the approximation method does not minimize ))(Y — Y’”)’, 
it satisfies the equation >>(Y — Y”) = 0. 

(5) The approximation solution is invariant under certain simple trans- 
formations. Equation (8) shows that the addition (or subtraction) of the 
same constant to every X- and/or Y-coordinate does not change Z, and so 
will not change the value of )>(Y — Y”)’. The addition of a constant to the 
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X- (or Y-) coordinates merely shifts the origin to the right or left (or up or 
down) without affecting the relationship of the plotted points to one another. 

Although it is perhaps not immediately apparent, it is true nonetheless 
that multiplying every X-coordinate by the same constant leaves the solution 
essentially unchanged, i.e, >.(Y — Y”)’ is unaffected, even though the 
Z-values and the constants d and e do change. What happens in this instance 
may perhaps be made clear if it is recalled that multiplying each X-coordinate 
by a constant is equivalent to stretching (or compressing) the scale along 
the abscissa without affecting the vertical distances along the ordinate. 
It is easy to visualize that such a stretching will not alter the vertical distance 
of an observed point from a predicted point, and hence the squared residual 
will remain unaltered. 

Finally, we can visualize the effect of multiplying the Y-coordinates 
by a constant, K. This will stretch or compress the Y-scale. Each Y-residual 
will be increased K times, and each squared residual will be increased K? 
times. Thus, multiplying every Y-coordinate by the constant, K, yields 
a final >>(Y — Y”)? value which is K’ times the original one. 


A Practical Illustration 
As a practical illustration of the principles discussed above, we shall use 


the set of raw data appearing in columns (1) and (2) of Table 1. A plot of 


TABLE 1 


Parabolic Equations Fit to a Set of Data by the 
Approximation Method* 








Best-Fitting Equations by 





x y Lewis’ Approximation Method 2(Y — Y")3 
11.2 24 Y” = 1.025X2 —17.79X + 94.69 11,960 
11.0 16 yo" .6897X? — 14.36X + 90.51 10,590 
10.0 33 TY” = — .8790X2 + 9.234X + 28.56 271.9 
9.0 34 Y” = — .6276X2+ 4.646X + 43.02 734.9 
8.2 40 Y” = — .9332X? + 7.696X + 39.64 804.8 
7.8 50 Y” = —1.438X? + 14.62X + 23.49 693.6 
6.7 51 TY” = — .9355X?2 + 9.257X + 30.98 225.7 
4.9 51 Y” = — .6229X?+ 4.116X + 45.78 1,029. 
4.6 58 Y” = —1.148X? + 10.08X + 35.92 917.8 
3.8 53 TY” = — .9673X2 + 9.595X + 30.51 226.9 
3.0 54 Y” = —1.238X2 + 12.02X + 29.08 510.7 
2.0 43 TY” = — .8740X?2 + 9.318X + 27.86 303.1 
1.5 42 TY” = — .9034X2 + 9.106X + 30.37 224.8 
5 39 Y” = —1.640X2 + 14.71X + 32.05 2,588. 
e 25 Y” = —1.981X2 + 19.03X + 23.12 3,185. 





*Columps (1) and (2) contain the raw data; equations of the best-fitting parabolas when each pair of 
coordinates is taken as X1, Y1 in Lewis’ approximation method appear in column (3); and 2 (Y — Y”)2 for 
each equation is given in column (4). Daggers (t) identify the approximate solutions which do not differ markedly 
from the exact, least-squares solution. 
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these points, and of the best-fitting parabola computed by means of the 
three simultaneous equations (2), are shown in Figure 1. The equation of 
the parabola is 


Y’ = —.9570X* + 9.670X + 29.62. 


d(Y — Y’)? computed from this equation is 219.2. 

Column (3) of Table 1 gives the 15 parabolic equations obtained when 
each pair of empirical coordinates is taken successively as the X, , Y, in 
the approximation method, and Figures 2, 3, and 4 each show curves for 
three of the equations computed by the approximation. In each figure the 
points are those shown in Figure 1 and listed in columns (1) and (2) of Table 1. 
The solid curve is the best-fitting parabola shown in Figure 1. The inter- 
rupted lines are three solutions achieved by the approximation method. 
Arrows identify the points used as X, , Y, in the approximation solutions. 

Finally, column (4) of Table 1 gives the }>(Y — Y”)? values for each 
of the 15 parabolic equations computed by the approximation. The constants 
of the equations in Table 1 have been rounded off to four significant digits 
from intermediate calculations which were carried out to more significant 
figures. This is more accuracy than is usually warranted by this kind of 
problem, and rounding off intermediate calculations will change the values 
somewhat. Actually, all computations were performed independently by two 
persons, one of whom carried out intermediate calculations to many signifi- 
cant figures, the other of whom rounded off intermediate calculations to 
no more than four significant digits. The final discrepancies in the 
>(Y — Y”)? values between the two computers generally amounted to less 
than 1 per cent and never exceeded 3 per cent. 

Table 1 confirms what we should have expected on theoretical grounds— 
in every instance the approximations yield }\(Y — Y”) values which are 
greater than the >>(Y — Y’)’ resulting from the exact formula. Further, 
only 5 of the 15 approximate equations (those marked by daggers) are 
even reasonably good fits to the data. Note that for all five of these equations 
the X, , Y,; values are close to the best-fitting parabola (these are the points 
designated by arrows in Figure 1), thus agreeing with the second statement 
above. The remaining 10 of the 15 approximate equations yield })(Y — Y’)? 
values which are two to fifty times as large as the )»(Y — Y’)? value obtained 
from the exact equation. In a few instances (see Figure 3) the approximation 
method yields equations which are obviously bad fits to the data; and, in 
this example, two fits (Figure 4) are manifestly absurd. Note that the very 
bad fits occur when X, , Y, is selected from either end of the X-range, thus 
agreeing with the third statement of the preceding section. Finally, we 
can see from Figures 2, 3, and 4 that the equations achieved by the approxi- 
mation method always pass through X,, Y, . 
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A Limitation of the Approximation Method 


Having worked through an illustrative example, we are now in a better 
Position to analyze the true nature of the limitation which applies to the 
approximation method. [Incidentally, the fact that our illustrative example 
included the vertex of a parabola (see Figure 1), whereas Lewis’ example deals 
with only one leg of a parabola (2, 78), is not the source of the difficulty.] 
In describing his transformation Lewis (2, 77), states, ‘“‘--- equation (3) 
may be used to represent a set of experimental data if a plot of 


[X,(Y — Y,)/(X — X))] 


approximates a straight line.” In general, this is true. If Z in our terminology 
[see equation (8)], yields a linear plot as a function of X, the approximation 
method can be used to fit a parabola to the data. The fit will never be so 
good as the fit achieved by the exact, least-squares solution for the reasons 
given earlier but, on the other hand, it will not be far off. 

Perhaps the most important limitation of the approximation method is 
that some data which fall along a parabola will not yield linear plots of 
X, Z. As a result, the experimenter may conclude that a parabolic equation 
is not the one to use for his data. If, however, he decides to go ahead and 
fit a parabola anyway, he will find that the equation resulting from the use 
of the approximation will fit the data very poorly. This is exactly what 
happened to produce the 10 poor fits in the example used in the preceding 
section. 

Under what circumstances will parabolic data not yield a linear plot 
of X, Z? In general, this will occur when X, , Y, is located near some other 
point which deviates moderately in Y-value from Y, . The reason for this is 
contained in the third statement made earlier about the approximation 
method. Points close to X, , Y, are weighted disproportionately when they 
are transformed to Z-values. 

To illustrate the nature and importance of the overweighting, consider 
the curves in Figure 5. The solid curve is of the equation 


Y = —X’+ 10X + 2. 


Let us suppose that we had a set of data which could be described by this 
curve but that there was a small amount of variability in the data as indicated 
by the dashed boundary lines. Now let us take the point, X = 0, Y = 2, as 
our X, , Y, and plot Z as a function of X for the limits of variability shown 
in Figure 5. The results appear in Figure 6. Note the enormous variability 
in Z for X-values close to X, . This extreme distortion of Z occurs for values 
of X close to X, irrespective of where X, lies within the X-range. If X, is 
selected near the center of the X-range, however, there is a balancing of 
Z-values with little dispersion on both ends of the X-range. 
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To illustrate more fully the importance of this distortion in curve fitting, 
take the five points whose coordinates are: 0, 2; .03, 1; 1, 11; 3, 23; and 5, 27. 
These points are shown in Figure 7. If we select the point 0, 2 as our X,, Yi, 
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Some Hypothetical Data : Figure 6 
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Figure 8 
Z-Transformations of the Points in Figure 7 


for Xi = 0, Y; = 2 


FIGURE 7 
Hypothetical Data Fit by the Exact Least- 
Squares Method (Solid Line), and by the 
Approximation Method (Dashed Line) 


and plot Z as a function of X, we have the result shown in Figure 8. Thus, 
although Figure 7 shows that the five points lie on an almost perfect parabola, 
Figure 8 shows that Z is by no means a linear function of X. 
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Fitting a straight line to the points in Figure 8 gives the solution 
Z’ = 5.60X — 15.7. 
The parabola corresponding to this equation is 
Y” = 5.60X’ — 15.7X + 2.00, 


and is plotted as the dashed line in Figure 7. The result is highly unsatis- 
factory solely because of the extreme weighting of the single point .03, 1, 
when it is transformed into Z (Figure 8). The solid line in Figure 8 best 
fits the four points there, but the best-fitting parabola would require that the 
line describing Z as a function of X be more nearly like the dashed one in 
Figure 8. 

This, of course, is an extreme example, but, as we have seen from the 
previous illustration, it is a circumstance which could arise with practical 
experimental data. 
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